Usage¶

Warning
Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.
usage: pvacseq run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
                   [-r IEDB_RETRIES] [-k] [-t N_THREADS]
                   [--netmhciipan-version {4.3,4.2,4.1,4.0}]
                   [-e1 CLASS_I_EPITOPE_LENGTH] [-e2 CLASS_II_EPITOPE_LENGTH]
                   [-b BINDING_THRESHOLD]
                   [--percentile-threshold PERCENTILE_THRESHOLD]
                   [--percentile-threshold-strategy {conservative,exploratory}]
                   [--allele-specific-binding-thresholds] [-m {lowest,median}]
                   [-m2 {ic50,percentile}] [--pass-only]
                   [--normal-sample-name NORMAL_SAMPLE_NAME]
                   [--normal-cov NORMAL_COV] [--tdna-cov TDNA_COV]
                   [--trna-cov TRNA_COV] [--normal-vaf NORMAL_VAF]
                   [--tdna-vaf TDNA_VAF] [--trna-vaf TRNA_VAF]
                   [--tumor-purity TUMOR_PURITY]
                   [--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY]
                   [--maximum-transcript-support-level {1,2,3,4,5}]
                   [--biotypes BIOTYPES] [--allow-incomplete-transcripts]
                   [--net-chop-method {cterm,20s}] [--netmhc-stab]
                   [--net-chop-threshold NET_CHOP_THRESHOLD]
                   [--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS]
                   [--run-reference-proteome-similarity]
                   [--blastp-path BLASTP_PATH]
                   [--blastp-db {refseq_select_prot,refseq_protein}]
                   [--peptide-fasta PEPTIDE_FASTA] [-a {sample_name}]
                   [-s FASTA_SIZE] [--exclude-NAs]
                   [-d DOWNSTREAM_SEQUENCE_LENGTH]
                   [--genes-of-interest-file GENES_OF_INTEREST_FILE]
                   [--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD]
                   [--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT]
                   [-p PHASED_PROXIMAL_VARIANTS_VCF] [-c MINIMUM_FOLD_CHANGE]
                   [--allele-specific-anchors]
                   [--anchor-contribution-threshold ANCHOR_CONTRIBUTION_THRESHOLD]
                   [--expn-val EXPN_VAL]
                   input_file sample_name allele
                   {BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                   [{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
                   output_dir

Run the pVACseq pipeline

positional arguments:
  input_file            A VEP-annotated single- or multi-sample VCF containing
                        genotype, transcript, Wildtype protein sequence, and
                        Frameshift protein sequence information.The VCF may be
                        gzipped (requires tabix index).
  sample_name           The name of the tumor sample being processed. When
                        processing a multi-sample VCF the sample name must be
                        a sample ID in the input VCF #CHROM header line.
  allele                Name of the allele to use for epitope prediction.
                        Multiple alleles can be specified using a comma-
                        separated list. For a list of available alleles, use:
                        `pvacseq valid_alleles`.
  {BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                        The epitope prediction algorithms to use. Multiple
                        prediction algorithms can be specified, separated by
                        spaces.
  output_dir            The directory for writing all result files.

optional arguments:
  -h, --help            show this help message and exit
  --iedb-install-directory IEDB_INSTALL_DIRECTORY
                        Directory that contains the local installation of IEDB
                        MHC I and/or MHC II. (default: None)
  -r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
                        Number of retries when making requests to the IEDB
                        RESTful web interface. Must be less than or equal to
                        100. (default: 5)
  -k, --keep-tmp-files  Keep intermediate output files. This might be useful
                        for debugging purposes. (default: False)
  -t N_THREADS, --n-threads N_THREADS
                        Number of threads to use for parallelizing peptide-MHC
                        binding prediction calls. (default: 1)
  --netmhciipan-version {4.3,4.2,4.1,4.0}
                        Specify the version of NetMHCIIpan or NetMHCIIpanEL to
                        be used during the run. (default: 4.1)
  -e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
                        Length of MHC Class I subpeptides (neoepitopes) to
                        predict. Multiple epitope lengths can be specified
                        using a comma-separated list. Typical epitope lengths
                        vary between 8-15. Required for Class I prediction
                        algorithms. (default: [8, 9, 10, 11])
  -e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
                        Length of MHC Class II subpeptides (neoepitopes) to
                        predict. Multiple epitope lengths can be specified
                        using a comma-separated list. Typical epitope lengths
                        vary between 11-30. Required for Class II prediction
                        algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        When creating the filtered.tsv report, only include
                        epitopes where the mutant allele has ic50 binding
                        scores below this value. When creating the
                        aggreated.tsv report, only bin candidates into the
                        Pass tier that meet this threshold. (default: 500)
  --percentile-threshold PERCENTILE_THRESHOLD
                        When creating the filtered.tsv report, only include
                        epitopes where the mutant allele has a percentile rank
                        below this value. When creating the aggregated.tsv
                        report, only bin candidates into the Pass tier that
                        meet this threshold. (default: None)
  --percentile-threshold-strategy {conservative,exploratory}
                        Specify the candidate inclusion strategy. The
                        'conservative' option requires a candidate to pass
                        BOTH the binding threshold and percentile threshold
                        (default). The 'exploratory' option requires a
                        candidate to pass EITHER the binding threshold or the
                        percentile threshold. (default: conservative)
  --allele-specific-binding-thresholds
                        Use allele-specific binding thresholds. To print the
                        allele-specific binding thresholds run `pvacseq
                        allele_specific_cutoffs`. If an allele does not have a
                        special threshold value, the `--binding-threshold`
                        value will be used. (default: False)
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when filtering epitopes
                        by binding-threshold or minimum fold change. lowest:
                        Use the best MT Score and Corresponding Fold Change
                        (i.e. the lowest MT ic50 binding score and
                        corresponding fold change of all chosen prediction
                        methods). median: Use the median MT Score and Median
                        Fold Change (i.e. the median MT ic50 binding score and
                        fold change of all chosen prediction methods).
                        (default: median)
  -m2 {ic50,percentile}, --top-score-metric2 {ic50,percentile}
                        Whether to use median/best IC50 or to use median/best
                        percentile score when determining the best peptide in
                        the aggregated report and the top score filter
                        (filtered report). This parameter is also used to
                        influence the primary sorting criteria in the
                        aggregated report for the candidates within each tier
                        as well as in the filtered report. (default: ic50)
  --pass-only           Only process VCF entries with a PASS status. (default:
                        False)
  --normal-sample-name NORMAL_SAMPLE_NAME
                        In a multi-sample VCF, the name of the matched normal
                        sample. (default: None)
  --normal-cov NORMAL_COV
                        Normal Coverage Cutoff. When creating the filtered.tsv
                        report, only include epitopes with a normal read depth
                        above this cutoff. (default: 5)
  --tdna-cov TDNA_COV   Tumor DNA Coverage Cutoff. When creating the
                        filtered.tsv report, only include epitopes with a
                        tumor DNA read depth above this cutoff. (default: 10)
  --trna-cov TRNA_COV   Tumor RNA Coverage Cutoff. When creating the
                        filtered.tsv report, only include epitopes with a
                        tumor RNA read depth above this cutoff. (default: 10)
  --normal-vaf NORMAL_VAF
                        Normal VAF Cutoff in decimal format. When creating the
                        filtered.tsv report, only include epitopes with a
                        normal VAF BELOW this cutoff. (default: 0.02)
  --tdna-vaf TDNA_VAF   Tumor DNA VAF Cutoff in decimal format. When creating
                        the filtered.tsv report, only include epitopes with a
                        tumor DNA VAF above this cutoff. When creating the
                        aggregated.tsv report, use this cutoff to determine if
                        a candidate is subclonal or not. Only clonal
                        candidates will be binned into the Pass tier.
                        (default: 0.25)
  --trna-vaf TRNA_VAF   Tumor RNA VAF Cutoff in decimal format. When creating
                        the filtered.tsv report, only include epitopes with a
                        tumor RNA VAF above this cutoff. This parameter is
                        also used in combination with the --expn-val cutoff
                        for tiering candidates when creating the
                        aggregated.tsv report. This allele expression cutoff
                        is calculated as --trna-vaf * --expn-val * 10. Only
                        candidates with Allele Expr (RNA Expr * RNA VAF) above
                        the allele expr cutoff will be binned into the Pass
                        tier. (default: 0.25)
  --tumor-purity TUMOR_PURITY
                        Value between 0 and 1 indicating the fraction of tumor
                        cells in the tumor sample. Information is used during
                        aggregate report creation for a simple estimation of
                        whether variants are subclonal or clonal based on VAF.
                        If not provided, purity is estimated directly from the
                        VAFs. (default: None)
  --transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY
                        Specify the criteria to consider when prioritizing and
                        tiering candidates during aggregate report creation or
                        filtering during Transcript filtering when creating
                        the filtered.tsv file. 'canonical' will
                        prioritize/select candidates resulting from variants
                        on a Ensembl canonical transcript. 'mane_select' will
                        prioritize/select candidates resulting from variants
                        on a MANE select transcript. 'tsl' will
                        prioritize/select candidates where the transcript
                        support level (TSL) matches the --maximum-transcript-
                        support-level. When selecting more than one criteria,
                        a transcript meeting EITHER of the selected criteria
                        will be prioritized/selected. Only if the best
                        transcript of a candidate passes EITHER of the
                        selected criteria will the candidate be binned into
                        the Pass tier. (default: ['canonical', 'mane_select',
                        'tsl'])
  --maximum-transcript-support-level {1,2,3,4,5}
                        The threshold to use for filtering epitopes on the
                        Ensembl transcript support level (TSL). Epitopes with
                        a transcript support level <= to this cutoff will be a
                        considered a good transcript if 'tsl' is one of the
                        selected transcript prioritization strategy options.
                        (default: 1)
  --biotypes BIOTYPES   A list of biotypes to use for pre-filtering
                        transcripts for processing in the pipeline. (default:
                        ['protein_coding'])
  --allow-incomplete-transcripts
                        By default, transcripts annotated with incomplete CDS
                        (i.e., 'cds_start_NF' or 'cds_end_NF' flags in the VEP
                        CSQ field) are excluded from analysis, as they often
                        produce invalid protein sequences. Use this flag to
                        allow candidates from such transcripts. Only peptides
                        that do not contain 'X' will be included. These
                        candidates will be deprioritized relative to those
                        from transcripts without incomplete CDS flags.
                        (default: False)
  --net-chop-method {cterm,20s}
                        NetChop prediction method to use ("cterm" for C term
                        3.0, "20s" for 20S 3.0). C-term 3.0 is trained with
                        publicly available MHC class I ligands and the authors
                        believe that is performs best in predicting the
                        boundaries of CTL epitopes. 20S is trained with in
                        vitro degradation data. (default: None)
  --netmhc-stab         Run NetMHCStabPan after all filtering and add
                        stability predictions to predicted epitopes. (default:
                        False)
  --net-chop-threshold NET_CHOP_THRESHOLD
                        NetChop prediction threshold (increasing the threshold
                        results in better specificity, but worse sensitivity).
                        (default: 0.5)
  --problematic-amino-acids PROBLEMATIC_AMINO_ACIDS
                        A list of amino acids to consider as problematic.
                        During aggregate report creation, only candidates
                        without problematic positions will be binned into the
                        Pass tier. Each entry can be specified in the
                        following format: `amino_acid(s)`: One or more one-
                        letter amino acid codes. Any occurrence of this amino
                        acid string, regardless of the position in the
                        epitope, is problematic. When specifying more than one
                        amino acid, they will need to occur together in the
                        specified order. `amino_acid:position`: A one letter
                        amino acid code, followed by a colon separator,
                        followed by a positive integer position (one-based).
                        The occurrence of this amino acid at the position
                        specified is problematic., E.g. G:2 would check for a
                        Glycine at the second position of the epitope. The
                        N-terminus is defined as position 1.
                        `amino_acid:-position`: A one letter amino acid code,
                        followed by a colon separator, followed by a negative
                        integer position. The occurrence of this amino acid at
                        the specified position from the end of the epitope is
                        problematic. E.g., G:-3 would check for a Glycine at
                        the third position from the end of the epitope. The
                        C-terminus is defined as position -1. (default: None)
  --run-reference-proteome-similarity
                        Blast peptides against the reference proteome or
                        search for peptides in a reference proteome fasta
                        file. During aggregate report creation, only
                        candidates without a reference proteome match will be
                        binned into the Pass tier. (default: False)
  --blastp-path BLASTP_PATH
                        Blastp installation path. (default: None)
  --blastp-db {refseq_select_prot,refseq_protein}
                        The blastp database to use. (default:
                        refseq_select_prot)
  --peptide-fasta PEPTIDE_FASTA
                        When running the reference proteome similarity step,
                        use this reference peptide FASTA file to find matches
                        instead of blastp. (default: None)
  -a {sample_name}, --additional-report-columns {sample_name}
                        Additional columns to output in the final report. If
                        sample_name is chosen, this will add a column with the
                        sample name in every row of the output. This can be
                        useful if you later want to concatenate results from
                        multiple individuals into a single file. (default:
                        None)
  -s FASTA_SIZE, --fasta-size FASTA_SIZE
                        Number of FASTA entries per IEDB request. For some
                        resource-intensive prediction algorithms like
                        Pickpocket and NetMHCpan it might be helpful to reduce
                        this number. Needs to be an even number. (default:
                        200)
  --exclude-NAs         Exclude NA values from the filtered output. (default:
                        False)
  -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
                        Cap to limit the downstream sequence length for
                        frameshifts when creating the FASTA file. Use 'full'
                        to include the full downstream sequence. (default:
                        1000)
  --genes-of-interest-file GENES_OF_INTEREST_FILE
                        A genes of interest file. Predictions resulting from
                        variants on genes in this list will be marked in the
                        result files. The file should be formatted to have
                        each gene on a separate line without a header line. If
                        no file is specified, the Cancer Gene Census list of
                        high-confidence genes is used as the default.
                        (default: None)
  --aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD
                        Threshold for including epitopes when creating the
                        aggregate report (default: 5000)
  --aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT
                        Limit neoantigen candidates included in the aggregate
                        report to only the best n candidates per variant.
                        (default: 15)
  -p PHASED_PROXIMAL_VARIANTS_VCF, --phased-proximal-variants-vcf PHASED_PROXIMAL_VARIANTS_VCF
                        A VCF with phased proximal variant information. Must
                        be gzipped and tabix indexed. (default: None)
  -c MINIMUM_FOLD_CHANGE, --minimum-fold-change MINIMUM_FOLD_CHANGE
                        Minimum fold change between mutant (MT) binding score
                        and wild-type (WT) score (fold change = WT/MT). The
                        default is 0, which filters no results, but 1 is often
                        a sensible choice (requiring that binding is better to
                        the MT than WT peptide). This fold change is sometimes
                        referred to as a differential agretopicity index.
                        (default: 0.0)
  --allele-specific-anchors
                        Use allele-specific anchor positions when tiering
                        epitopes in the aggregate report. This option is
                        available for 8, 9, 10, and 11mers and only for HLA-A,
                        B, and C alleles. If this option is not enabled or as
                        a fallback for unsupported lengths and alleles, the
                        default positions of 1, 2, epitope length - 1, and
                        epitope length are used. Please see
                        https://doi.org/10.1101/2020.12.08.416271 for more
                        details. (default: False)
  --anchor-contribution-threshold ANCHOR_CONTRIBUTION_THRESHOLD
                        For determining allele-specific anchors, each position
                        is assigned a score based on how binding is influenced
                        by mutations. From these scores, the relative
                        contribution of each position to the overall binding
                        is calculated. Starting with the highest relative
                        contribution, positions whose scores together account
                        for the selected contribution threshold are assigned
                        as anchor locations. As a result, a higher threshold
                        leads to the inclusion of more positions to be
                        considered anchors. (default: 0.8)
  --expn-val EXPN_VAL   Gene and transcript Expression Cutoff in decimal
                        format. When creating the filtered.tsv report, only
                        include epitopes with gene and transcript expression
                        above this cutoff. This parameter is also used in
                        combination with the --trna-vaf cutoff for tiering
                        candidates when creating the aggregated.tsv report.
                        This allele expression cutoff is calculated as --trna-
                        vaf * --expn-val * 10. Only candidates with Allele
                        Expr (RNA Expr * RNA VAF) above the allele expr cutoff
                        will be binned into the Pass tier. (default: 1.0)
Table of Contents

Previous topic

Next topic

Usage¶