pVACfuse logo

UsageΒΆ

Warning

Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.

Creating converter from 7 to 5
Creating converter from 5 to 7
Creating converter from 7 to 5
Creating converter from 5 to 7
usage: pvacfuse run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
                    [-r IEDB_RETRIES] [-k] [-t N_THREADS]
                    [--netmhciipan-version {4.3,4.2,4.1,4.0}]
                    [--use-normalized-percentiles]
                    [--reference-scores-path REFERENCE_SCORES_PATH]
                    [-e1 CLASS_I_EPITOPE_LENGTH] [-e2 CLASS_II_EPITOPE_LENGTH]
                    [-b BINDING_THRESHOLD]
                    [--binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD]
                    [--immunogenicity-percentile-threshold IMMUNOGENICITY_PERCENTILE_THRESHOLD]
                    [--presentation-percentile-threshold PRESENTATION_PERCENTILE_THRESHOLD]
                    [--percentile-threshold-strategy {conservative,exploratory}]
                    [--allele-specific-binding-thresholds]
                    [-m {lowest,median}] [-m2 TOP_SCORE_METRIC2]
                    [--net-chop-method {cterm,20s}] [--netmhc-stab]
                    [--net-chop-threshold NET_CHOP_THRESHOLD]
                    [--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS]
                    [--run-reference-proteome-similarity]
                    [--blastp-path BLASTP_PATH]
                    [--blastp-db {refseq_select_prot,refseq_protein}]
                    [--peptide-fasta PEPTIDE_FASTA] [-a {sample_name}]
                    [-s FASTA_SIZE] [-d DOWNSTREAM_SEQUENCE_LENGTH]
                    [--genes-of-interest-file GENES_OF_INTEREST_FILE]
                    [--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD]
                    [--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT]
                    [--starfusion-file STARFUSION_FILE]
                    [--read-support READ_SUPPORT] [--expn-val EXPN_VAL]
                    input_file sample_name allele
                    {BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                    [{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
                    output_dir

Run the pVACfuse pipeline

positional arguments:
  input_file            An AGFusion output directory or Arriba fusion.tsv
                        output file.
  sample_name           The name of the sample being processed. This will be
                        used as a prefix for output files.
  allele                Name of the allele to use for epitope prediction.
                        Multiple alleles can be specified using a comma-
                        separated list. For a list of available alleles, use:
                        `pvacfuse valid_alleles`.
  {BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                        The epitope prediction algorithms to use. Multiple
                        prediction algorithms can be specified, separated by
                        spaces.
  output_dir            The directory for writing all result files.

optional arguments:
  -h, --help            show this help message and exit
  --iedb-install-directory IEDB_INSTALL_DIRECTORY
                        Directory that contains the local installation of IEDB
                        MHC I and/or MHC II. (default: None)
  -r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
                        Number of retries when making requests to the IEDB
                        RESTful web interface. Must be less than or equal to
                        100. (default: 5)
  -k, --keep-tmp-files  Keep intermediate output files. This might be useful
                        for debugging purposes. (default: False)
  -t N_THREADS, --n-threads N_THREADS
                        Number of threads to use for parallelizing peptide-MHC
                        binding prediction calls. (default: 1)
  --netmhciipan-version {4.3,4.2,4.1,4.0}
                        Specify the version of NetMHCIIpan or NetMHCIIpanEL to
                        be used during the run. (default: 4.1)
  --use-normalized-percentiles
                        When set, calculate normalized percentile scores for
                        all prediction algorithms. For algorithms that do not
                        natively provide percentiles, percentiles will be
                        derived by comparing prediction scores against pre-
                        computed reference distributions. For algorithms that
                        do provide native percentiles, their values will be
                        overwritten with the normalized percentile. (default:
                        False)
  --reference-scores-path REFERENCE_SCORES_PATH
                        Directory to store pre-computed reference percentile
                        files. If a file is missing, it will be downloaded
                        here when --use-normalized-percentiles is set.
                        (default: /tmp)
  -e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
                        Length of MHC Class I subpeptides (neoepitopes) to
                        predict. Multiple epitope lengths can be specified
                        using a comma-separated list. Typical epitope lengths
                        vary between 8-15. Required for Class I prediction
                        algorithms. (default: [8, 9, 10, 11])
  -e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
                        Length of MHC Class II subpeptides (neoepitopes) to
                        predict. Multiple epitope lengths can be specified
                        using a comma-separated list. Typical epitope lengths
                        vary between 11-30. Required for Class II prediction
                        algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        Report only epitopes where the mutant allele has ic50
                        binding scores below this value. (default: 500)
  --binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD
                        Report only epitopes where the mutant allele has a
                        binding percentile rank below this value. (default:
                        2.0)
  --immunogenicity-percentile-threshold IMMUNOGENICITY_PERCENTILE_THRESHOLD
                        Report only epitopes where the mutant allele has a
                        immunogenicity percentile rank below this value.
                        (default: 2.0)
  --presentation-percentile-threshold PRESENTATION_PERCENTILE_THRESHOLD
                        Report only epitopes where the mutant allele has a
                        presentation percentile rank below this value.
                        (default: 2.0)
  --percentile-threshold-strategy {conservative,exploratory}
                        Specify the candidate inclusion strategy. The
                        'conservative' option requires a candidate to pass
                        BOTH the binding threshold and all percentile
                        thresholds set (default). The 'exploratory' option
                        requires a candidate to pass EITHER the binding
                        threshold or any of the percentile thresholds set.
                        (default: conservative)
  --allele-specific-binding-thresholds
                        Use allele-specific binding thresholds. To print the
                        allele-specific binding thresholds run `pvacfuse
                        allele_specific_cutoffs`. If an allele does not have a
                        special threshold value, the `--binding-threshold`
                        value will be used. (default: False)
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when filtering epitopes
                        by binding-threshold or minimum fold change. lowest:
                        Use the best MT Score and Corresponding Fold Change
                        (i.e. the lowest MT ic50 binding score and
                        corresponding fold change of all chosen prediction
                        methods). median: Use the median MT Score and Median
                        Fold Change (i.e. the median MT ic50 binding score and
                        fold change of all chosen prediction methods).
                        (default: median)
  -m2 TOP_SCORE_METRIC2, --top-score-metric2 TOP_SCORE_METRIC2
                        Which metrics to consider when selecting the best
                        peptide in the aggregate erport and the top score
                        filter step (filtered report). Each specified metric
                        will be ranked and the sum of these ranks will be
                        used. This rank sum is also used as the primary
                        sorting criteria in the aggregated report for the
                        candidates within each tier as well as in the filtered
                        report. Whether the lowest or median is considered for
                        each metric is controlled by the --top-score-metric
                        parameter. (default: ['ic50', 'combined_percentile'])
  --net-chop-method {cterm,20s}
                        NetChop prediction method to use ("cterm" for C term
                        3.0, "20s" for 20S 3.0). C-term 3.0 is trained with
                        publicly available MHC class I ligands and the authors
                        believe that is performs best in predicting the
                        boundaries of CTL epitopes. 20S is trained with in
                        vitro degradation data. (default: None)
  --netmhc-stab         Run NetMHCStabPan after all filtering and add
                        stability predictions to predicted epitopes. (default:
                        False)
  --net-chop-threshold NET_CHOP_THRESHOLD
                        NetChop prediction threshold (increasing the threshold
                        results in better specificity, but worse sensitivity).
                        (default: 0.5)
  --problematic-amino-acids PROBLEMATIC_AMINO_ACIDS
                        A list of amino acids to consider as problematic. Each
                        entry can be specified in the following format:
                        `amino_acid(s)`: One or more one-letter amino acid
                        codes. Any occurrence of this amino acid string,
                        regardless of the position in the epitope, is
                        problematic. When specifying more than one amino acid,
                        they will need to occur together in the specified
                        order. `amino_acid:position`: A one letter amino acid
                        code, followed by a colon separator, followed by a
                        positive integer position (one-based). The occurrence
                        of this amino acid at the position specified is
                        problematic., E.g. G:2 would check for a Glycine at
                        the second position of the epitope. The N-terminus is
                        defined as position 1. `amino_acid:-position`: A one
                        letter amino acid code, followed by a colon separator,
                        followed by a negative integer position. The
                        occurrence of this amino acid at the specified
                        position from the end of the epitope is problematic.
                        E.g., G:-3 would check for a Glycine at the third
                        position from the end of the epitope. The C-terminus
                        is defined as position -1. (default: None)
  --run-reference-proteome-similarity
                        Blast peptides against the reference proteome.
                        (default: False)
  --blastp-path BLASTP_PATH
                        Blastp installation path. (default: None)
  --blastp-db {refseq_select_prot,refseq_protein}
                        The blastp database to use. (default:
                        refseq_select_prot)
  --peptide-fasta PEPTIDE_FASTA
                        When running the reference proteome similarity step,
                        use this reference peptide FASTA file to find matches
                        instead of blastp. (default: None)
  -a {sample_name}, --additional-report-columns {sample_name}
                        Additional columns to output in the final report. If
                        sample_name is chosen, this will add a column with the
                        sample name in every row of the output. This can be
                        useful if you later want to concatenate results from
                        multiple individuals into a single file. (default:
                        None)
  -s FASTA_SIZE, --fasta-size FASTA_SIZE
                        Number of FASTA entries per IEDB request. For some
                        resource-intensive prediction algorithms like
                        Pickpocket and NetMHCpan it might be helpful to reduce
                        this number. Needs to be an even number. (default:
                        200)
  -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
                        Cap to limit the downstream sequence length for
                        frameshifts when creating the FASTA file. Use 'full'
                        to include the full downstream sequence. (default:
                        1000)
  --genes-of-interest-file GENES_OF_INTEREST_FILE
                        A genes of interest file. Predictions resulting from
                        variants on genes in this list will be marked in the
                        result files. The file should be formatted to have
                        each gene on a separate line without a header line. If
                        no file is specified, the Cancer Gene Census list of
                        high-confidence genes is used as the default.
                        (default: None)
  --aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD
                        Threshold for including epitopes when creating the
                        aggregate report (default: 5000)
  --aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT
                        Limit neoantigen candidates included in the aggregate
                        report to only the best n candidates per variant.
                        (default: 15)
  --starfusion-file STARFUSION_FILE
                        Path to a star-fusion.fusion_predictions.tsv or star-
                        fusion.fusion_predictions.abridged.tsv to extract read
                        support and expression information from. When running
                        with AGFusion data, both read support and expression
                        data from this file will be used. When running with
                        Arriba data, only expression data from this file is
                        used while read support data will be parsed from the
                        Arriba data directly. (default: None)
  --read-support READ_SUPPORT
                        Read Support Cutoff. Sites above this cutoff will be
                        considered. (default: 5)
  --expn-val EXPN_VAL   Expression Cutoff. Expression is meassured as FFPM
                        (fusion fragments per million total reads). Sites
                        above this cutoff will be considered. (default: 0.1)