Usage¶

Warning

Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.

It may be necessary to explore the parameter space a bit when running pVACvector. As binding predictions for some sites vary substantially across algorithms, the most conservative settings may result in no valid paths, often due to one “outlier” prediction. Carefully choosing which predictors to run may help ameliorate this issue as well.

In general, setting a lower binding threshold (e.g., 500nM) and using the median binding value (--top-score-metric median) will lead to greater possibility of a design, while more conservative settings of 1000nM and lowest/best binding value (--top-score-metric lowest) will give more confidence that there are no junctional neoepitopes.

When running pVACvector with a --percentile-threshold the --percentile-threshold-strategy parameter specifies how to evaluate junctional epitopes. The conservative option fails a junction if a junctional epitope fails EITHER the binding threshold OR the percentile threshold (default). The exploratory option fails a junction only if a junctional epitope fails BOTH the binding threshold AND the percentile threshold. The latter will increase the odds of a successful run (since a junction is less likely to be invalidated) but also increase the odds of a true junctional epitope.

Running pVACvector with spacer amino acid sequences may help eliminate junctional epitopes. The list of spacers to be tested is specified using the --spacers parameter. Peptide combinations without a spacer can be tested by including None in the list of spacers. The default spacer amino acid sequences are “None”, “AAY”, “HHHH”, “GGS”, “GPGPG”, “HHAA”, “AAL”, “HH”, “HHC”, “HHH”, “HHHD”, “HHL”, “HHHC”. Peptide junctions are tested with each spacer in the order that they are specified. If a tested spacers results in a valid junction without any well-binding junction epitopes, that junction will not be tested with any other spacers, even if a different spacer could potentially result in better junction scores. This reduces runtime. If a tested spacer for a junction doesn’t yield a valid junction (i.e., there are well-binding junction epitopes) the junction is tested with the next spacer in the input list.

If, after testing all spacers, no valid path is found, clipped versions of peptides are tested by removing leading and/or trailing amino acids and constructing junctions with the clipped peptides. The maximum number of amino acids to clip is controlled by the --max-clip-length argument.

In some cases, the (core) neoantigen candidate of a peptide sequence may be located toward the beginning or end of the sequence. In these cases, clipping may accidentally remove amino acids of the core neoantigen. To prevent this, the --max-clip-length should be set to the shortest number of flanking amino acids of any of the peptides to include in the vector. Alternatively, pVACvector also supports specifying the core neoantigen in the FASTA header when using a FASTA file as the input to pVACvector. If the core neoantigens for each sequence are specified in the input FASTA file, pVACvector will not clip into these neoantigens, even if the flanking sequence is smaller than the --max-clip-length. The core neoantigen should be specified like so:

>Peptide1 {"Best Peptide": "LYYSYGLLHI"}
WLYYSYGLLHIYGSGGYALYF

In this example Peptide1 is the ID of the sequence, LYYSYGLLHI is the core neoantigen candidate, and WLYYSYGLLHIYGSGGYALYF is the peptide sequence to include in the vector. The Best Peptide information will already be included in the FASTA headers if the FASTA file is created by using the pvacseq generate_protein_fasta command in conjunction with an aggregated report TSV as the --input-tsv parameter.

If no solution is found after testing all spacers and after clipping peptides, pVACvector will attempt to find a partial solution by excluding peptide sequences. The number of peptide sequences that are allowed to be removed is controlled via the --allow-n-peptide-exclusion parameter. Partial solutions will be written to their own result subdirectory. The subdirectory name reflects which peptide(s) were removed from the partial solution.

Our current recommendation is to run pVACvector several different ways, and choose the path resulting from the most conservative set of parameters.

usage: pvacvector run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
                      [-r IEDB_RETRIES] [-k] [-t N_THREADS]
                      [--netmhciipan-version {4.3,4.2,4.1,4.0}]
                      [--use-normalized-percentiles]
                      [--reference-scores-path REFERENCE_SCORES_PATH]
                      [-e1 CLASS_I_EPITOPE_LENGTH]
                      [-e2 CLASS_II_EPITOPE_LENGTH] [-b BINDING_THRESHOLD]
                      [--binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD]
                      [--percentile-threshold-strategy {conservative,exploratory}]
                      [--allele-specific-binding-thresholds]
                      [-m {lowest,median}] [--biotypes BIOTYPES]
                      [--allow-incomplete-transcripts] [-v INPUT_VCF]
                      [-n INPUT_N_MER] [--spacers SPACERS]
                      [--max-clip-length MAX_CLIP_LENGTH]
                      [--allow-n-peptide-exclusion ALLOW_N_PEPTIDE_EXCLUSION]
                      input_file sample_name allele
                      {BigMHC_EL,BigMHC_IM,DeepImmuno,ImmuScope_IM,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHC2pred,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                      [{BigMHC_EL,BigMHC_IM,DeepImmuno,ImmuScope_IM,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHC2pred,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
                      output_dir

Run the pVACvector pipeline

positional arguments:
  input_file            A .fa file with peptides or a pVACseq .tsv file with
                        epitopes to use for vector design.
  sample_name           The name of the sample being processed. This will be
                        used as a prefix for output files.
  allele                Name of the allele to use for epitope prediction.
                        Multiple alleles can be specified using a comma-
                        separated list. For a list of available alleles, use:
                        `pvacvector valid_alleles`.
  {BigMHC_EL,BigMHC_IM,DeepImmuno,ImmuScope_IM,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHC2pred,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
                        The epitope prediction algorithms to use. Multiple
                        prediction algorithms can be specified, separated by
                        spaces.
  output_dir            The directory for writing all result files.

optional arguments:
  -h, --help            show this help message and exit
  --iedb-install-directory IEDB_INSTALL_DIRECTORY
                        Directory that contains the local installation of IEDB
                        MHC I and/or MHC II. (default: None)
  -r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
                        Number of retries when making requests to the IEDB
                        RESTful web interface. Must be less than or equal to
                        100. (default: 5)
  -k, --keep-tmp-files  Keep intermediate output files. This might be useful
                        for debugging purposes. (default: False)
  -t N_THREADS, --n-threads N_THREADS
                        Number of threads to use for parallelizing peptide-MHC
                        binding prediction calls. (default: 1)
  --netmhciipan-version {4.3,4.2,4.1,4.0}
                        Specify the version of NetMHCIIpan or NetMHCIIpanEL to
                        be used during the run. (default: 4.1)
  --use-normalized-percentiles
                        When set, calculate normalized percentile scores for
                        all prediction algorithms. For algorithms that do not
                        natively provide percentiles, percentiles will be
                        derived by comparing prediction scores against pre-
                        computed reference distributions. For algorithms that
                        do provide native percentiles, their values will be
                        overwritten with the normalized percentile. (default:
                        False)
  --reference-scores-path REFERENCE_SCORES_PATH
                        Directory to store pre-computed reference percentile
                        files. If a file is missing, it will be downloaded
                        here when --use-normalized-percentiles is set.
                        (default: /tmp)
  -e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
                        Length of MHC Class I junctional epitopes to predict.
                        Multiple epitope lengths can be specified using a
                        comma-separated list. Typical epitope lengths vary
                        between 8-15. Required for Class I prediction
                        algorithms. (default: [8, 9, 10, 11])
  -e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
                        Length of MHC Class II junctional epitopes to predict.
                        Multiple epitope lengths can be specified using a
                        comma-separated list. Typical epitope lengths vary
                        between 11-30. Required for Class II prediction
                        algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        Fail junctions where any junctional epitope has ic50
                        binding scores below this value. (default: 500)
  --binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD
                        Fail junctions where any junctional epitope has a
                        binding percentile rank below this value. (default:
                        2.0)
  --percentile-threshold-strategy {conservative,exploratory}
                        Specify the how to evaluate junctional epitopes if a
                        percentile threshold is set. The 'conservative' option
                        fails a junction if a junctional epitope fails EITHER
                        the binding threshold OR the binding percentile
                        threshold (default). The 'exploratory' option fails a
                        junction only if a junctional epitope fails BOTH the
                        binding threshold AND the binding percentile
                        threshold. (default: conservative)
  --allele-specific-binding-thresholds
                        Use allele-specific binding thresholds when evaluating
                        junctional epitopes. To print the allele-specific
                        binding thresholds run `pvacvector
                        allele_specific_cutoffs`. If an allele does not have a
                        special threshold value, the `--binding-threshold`
                        value will be used. (default: False)
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when evaluating
                        junctional epitopes by binding-threshold. lowest: Use
                        the best MT Score (i.e. the lowest MT ic50 binding
                        score of all chosen prediction methods). median: Use
                        the median MT Score (i.e. the median MT ic50 binding
                        score of all chosen prediction methods). (default:
                        median)
  --biotypes BIOTYPES   A list of biotypes to use for pre-filtering
                        transcripts when running with an input VCF. (default:
                        ['protein_coding'])
  --allow-incomplete-transcripts
                        By default, transcripts annotated with incomplete CDS
                        (i.e., 'cds_start_NF' or 'cds_end_NF' flags in the VEP
                        CSQ field) are excluded from analysis, as they often
                        produce invalid protein sequences. Use this flag to
                        allow candidates from such transcripts. Only peptides
                        that do not contain 'X' will be included. These
                        candidates will be deprioritized relative to those
                        from transcripts without incomplete CDS flags.
                        (default: False)
  -v INPUT_VCF, --input-vcf INPUT_VCF
                        Path to original pVACseq input VCF file. Required if
                        input file is a pVACseq TSV. (default: None)
  -n INPUT_N_MER, --input-n-mer INPUT_N_MER
                        Length of the peptide sequence to use when creating
                        the FASTA from the pVACseq TSV. (default: 25)
  --spacers SPACERS     Comma-separated list of spacers to use for testing
                        junction epitopes. Include None to test junctions
                        without spacers. Peptide combinations will be tested
                        with each spacer in the order specified. (default: Non
                        e,AAY,HHHH,GGS,GPGPG,HHAA,AAL,HH,HHC,HHH,HHHD,HHL,HHHC
                        )
  --max-clip-length MAX_CLIP_LENGTH
                        Number of amino acids to permit clipping from the
                        start and/or end of peptides in order to test novel
                        junction epitopes when the first pass on the full
                        peptide fails. (default: 3)
  --allow-n-peptide-exclusion ALLOW_N_PEPTIDE_EXCLUSION
                        If no solution is found after adding spacers and
                        clipping peptides, attempt to find partial solutions
                        with up to n peptides removed. (default: 2)

Table of Contents

Previous topic

Next topic

Usage¶