Usage¶
Warning
Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.
It may be necessary to explore the parameter space a bit when running pVACvector. As binding predictions for some sites vary substantially across algorithms, the most conservative settings may result in no valid paths, often due to one “outlier” prediction. Carefully choosing which predictors to run may help ameliorate this issue as well.
In general, setting a lower binding threshold (e.g., 500nM) and using the median
binding value (--top-score-metric median) will lead to greater possibility
of a design, while more conservative settings of 1000nM and lowest/best binding
value (--top-score-metric lowest) will give more confidence that there are
no junctional neoepitopes.
When running pVACvector with a --percentile-threshold the --percentile-threshold-strategy
parameter specifies how to evaluate junctional epitopes. The conservative
option fails a junction if a junctional epitope fails EITHER the binding threshold
OR the percentile threshold (default). The exploratory option fails a junction
only if a junctional epitope fails BOTH the binding threshold AND the percentile threshold.
The latter will increase the odds of a successful run (since a junction is less likely to be invalidated) but also increase the odds of a true junctional epitope.
Running pVACvector with spacer amino acid sequences may help eliminate junctional
epitopes. The list of spacers to be tested is specified using the --spacers
parameter. Peptide combinations without a spacer can be tested by including
None in the list of spacers. The default spacer amino acid sequences are
“None”, “AAY”, “HHHH”, “GGS”, “GPGPG”, “HHAA”, “AAL”, “HH”, “HHC”, “HHH”, “HHHD”,
“HHL”, “HHHC”. Peptide junctions are tested with each spacer in the order that
they are specified. If a tested spacers results in a valid junction without any
well-binding junction epitopes, that junction will not be tested with any
other spacers, even if a different spacer could potentially result in better
junction scores. This reduces runtime. If a tested spacer for a junction doesn’t
yield a valid junction (i.e., there are well-binding junction epitopes) the junction
is tested wtih the next spacer in the input list.
If, after testing all spacers, no valid path is found, clipped versions of
peptides are tested by removing leading and/or trailing amino acids and
constructing junctions with the clipped peptides. The maximum number of amino
acids to clip is controlled by the --max-clip-length argument.
In some cases, the (core) neoantigen candidate of a peptide sequence may be located
toward the beginning or end of the sequence. In these cases, clipping may
accidentially remove amino acids of the core neoantigen. To prevent this, the
--max-clip-length should be set to the shortest number of flanking amino
acids of any of the peptides to include in the vector. Alternatively, pVACvector also
supports specifying the core neoantigen in the FASTA header when using a FASTA
file as the input to pVACvector. If the core neoantigens for each sequence are specified in the
input FASTA file, pVACvector will not clip into these neoantigens, even if the
flanking sequence is smaller than the --max-clip-length. The core neoantigen should
be specified like so:
>Peptide1 {"Best Peptide": "LYYSYGLLHI"}
WLYYSYGLLHIYGSGGYALYF
In this example Peptide1 is the ID of the sequence, LYYSYGLLHI is
the core neoantigen candidate, and WLYYSYGLLHIYGSGGYALYF is the peptide
sequence to include in the vector. The Best Peptide information will already
be included in the FASTA headers if the FASTA file is created by using the pvacseq
generate_protein_fasta command in conjunction with an aggregated report TSV
as the --input-tsv parameter.
If no solution is found after testing all spacers and after clipping peptides, pVACvector
will attempt to find a partial solution by excluding peptide sequences. The
number of peptide sequences that are allowed to be removed is controlled via
the --allow-n-peptide-exclusion parameter. Partial solutions will be
written to their own result subdirectory. The subdirectory name reflects which
peptide(s) were removed from the partial solution.
Our current recommendation is to run pVACvector several different ways, and choose the path resulting from the most conservative set of parameters.
usage: pvacvector run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
[-r IEDB_RETRIES] [-k] [-t N_THREADS]
[--netmhciipan-version {4.3,4.2,4.1,4.0}]
[-e1 CLASS_I_EPITOPE_LENGTH]
[-e2 CLASS_II_EPITOPE_LENGTH] [-b BINDING_THRESHOLD]
[--percentile-threshold PERCENTILE_THRESHOLD]
[--percentile-threshold-strategy {conservative,exploratory}]
[--allele-specific-binding-thresholds]
[-m {lowest,median}] [--biotypes BIOTYPES]
[--allow-incomplete-transcripts] [-v INPUT_VCF]
[-n INPUT_N_MER] [--spacers SPACERS]
[--max-clip-length MAX_CLIP_LENGTH]
[--allow-n-peptide-exclusion ALLOW_N_PEPTIDE_EXCLUSION]
input_file sample_name allele
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
[{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
output_dir
Run the pVACvector pipeline
positional arguments:
input_file A .fa file with peptides or a pVACseq .tsv file with
epitopes to use for vector design.
sample_name The name of the sample being processed. This will be
used as a prefix for output files.
allele Name of the allele to use for epitope prediction.
Multiple alleles can be specified using a comma-
separated list. For a list of available alleles, use:
`pvacvector valid_alleles`.
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
The epitope prediction algorithms to use. Multiple
prediction algorithms can be specified, separated by
spaces.
output_dir The directory for writing all result files.
optional arguments:
-h, --help show this help message and exit
--iedb-install-directory IEDB_INSTALL_DIRECTORY
Directory that contains the local installation of IEDB
MHC I and/or MHC II. (default: None)
-r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
Number of retries when making requests to the IEDB
RESTful web interface. Must be less than or equal to
100. (default: 5)
-k, --keep-tmp-files Keep intermediate output files. This might be useful
for debugging purposes. (default: False)
-t N_THREADS, --n-threads N_THREADS
Number of threads to use for parallelizing peptide-MHC
binding prediction calls. (default: 1)
--netmhciipan-version {4.3,4.2,4.1,4.0}
Specify the version of NetMHCIIpan or NetMHCIIpanEL to
be used during the run. (default: 4.1)
-e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
Length of MHC Class I junctional epitopes to predict.
Multiple epitope lengths can be specified using a
comma-separated list. Typical epitope lengths vary
between 8-15. Required for Class I prediction
algorithms. (default: [8, 9, 10, 11])
-e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
Length of MHC Class II junctional epitopes to predict.
Multiple epitope lengths can be specified using a
comma-separated list. Typical epitope lengths vary
between 11-30. Required for Class II prediction
algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
-b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
Fail junctions where any junctional epitope has ic50
binding scores below this value. (default: 500)
--percentile-threshold PERCENTILE_THRESHOLD
Fail junctions where any junctional epitope has a
percentile rank below this value. (default: None)
--percentile-threshold-strategy {conservative,exploratory}
Specify the how to evaluate junctional epitopes if a
percentile threshold is set. The 'conservative' option
fails a junction if a junctional epitope fails EITHER
the binding threshold OR the percentile threshold
(default). The 'exploratory' option fails a junction
only if a junctional epitope fails BOTH the binding
threshold AND the percentile threshold. (default:
conservative)
--allele-specific-binding-thresholds
Use allele-specific binding thresholds when evaluating
junctional epitopes. To print the allele-specific
binding thresholds run `pvacvector
allele_specific_cutoffs`. If an allele does not have a
special threshold value, the `--binding-threshold`
value will be used. (default: False)
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use when evaluating
junctional epitopes by binding-threshold. lowest: Use
the best MT Score (i.e. the lowest MT ic50 binding
score of all chosen prediction methods). median: Use
the median MT Score (i.e. the median MT ic50 binding
score of all chosen prediction methods). (default:
median)
--biotypes BIOTYPES A list of biotypes to use for pre-filtering
transcripts when running with an input VCF. (default:
['protein_coding'])
--allow-incomplete-transcripts
By default, transcripts annotated with incomplete CDS
(i.e., 'cds_start_NF' or 'cds_end_NF' flags in the VEP
CSQ field) are excluded from analysis, as they often
produce invalid protein sequences. Use this flag to
allow candidates from such transcripts. Only peptides
that do not contain 'X' will be included. These
candidates will be deprioritized relative to those
from transcripts without incomplete CDS flags.
(default: False)
-v INPUT_VCF, --input-vcf INPUT_VCF
Path to original pVACseq input VCF file. Required if
input file is a pVACseq TSV. (default: None)
-n INPUT_N_MER, --input-n-mer INPUT_N_MER
Length of the peptide sequence to use when creating
the FASTA from the pVACseq TSV. (default: 25)
--spacers SPACERS Comma-separated list of spacers to use for testing
junction epitopes. Include None to test junctions
without spacers. Peptide combinations will be tested
with each spacer in the order specified. (default: Non
e,AAY,HHHH,GGS,GPGPG,HHAA,AAL,HH,HHC,HHH,HHHD,HHL,HHHC
)
--max-clip-length MAX_CLIP_LENGTH
Number of amino acids to permit clipping from the
start and/or end of peptides in order to test novel
junction epitopes when the first pass on the full
peptide fails. (default: 3)
--allow-n-peptide-exclusion ALLOW_N_PEPTIDE_EXCLUSION
If no solution is found after adding spacers and
clipping peptides, attempt to find partial solutions
with up to n peptides removed. (default: 2)