UsageΒΆ
Warning
Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.
Creating converter from 7 to 5
Creating converter from 5 to 7
Creating converter from 7 to 5
Creating converter from 5 to 7
usage: pvacseq run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
[-r IEDB_RETRIES] [-k] [-t N_THREADS]
[--netmhciipan-version {4.3,4.2,4.1,4.0}]
[--use-normalized-percentiles]
[--reference-scores-path REFERENCE_SCORES_PATH]
[-e1 CLASS_I_EPITOPE_LENGTH] [-e2 CLASS_II_EPITOPE_LENGTH]
[-b BINDING_THRESHOLD]
[--binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD]
[--immunogenicity-percentile-threshold IMMUNOGENICITY_PERCENTILE_THRESHOLD]
[--presentation-percentile-threshold PRESENTATION_PERCENTILE_THRESHOLD]
[--percentile-threshold-strategy {conservative,exploratory}]
[--allele-specific-binding-thresholds] [-m {lowest,median}]
[-m2 TOP_SCORE_METRIC2] [--pass-only]
[--normal-sample-name NORMAL_SAMPLE_NAME]
[--normal-cov NORMAL_COV] [--tdna-cov TDNA_COV]
[--trna-cov TRNA_COV] [--normal-vaf NORMAL_VAF]
[--tdna-vaf TDNA_VAF] [--trna-vaf TRNA_VAF]
[--tumor-purity TUMOR_PURITY]
[--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY]
[--maximum-transcript-support-level {1,2,3,4,5}]
[--biotypes BIOTYPES] [--allow-incomplete-transcripts]
[--net-chop-method {cterm,20s}] [--netmhc-stab]
[--net-chop-threshold NET_CHOP_THRESHOLD]
[--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS]
[--run-reference-proteome-similarity]
[--blastp-path BLASTP_PATH]
[--blastp-db {refseq_select_prot,refseq_protein}]
[--peptide-fasta PEPTIDE_FASTA] [-a {sample_name}]
[-s FASTA_SIZE] [-d DOWNSTREAM_SEQUENCE_LENGTH]
[--genes-of-interest-file GENES_OF_INTEREST_FILE]
[--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD]
[--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT]
[-p PHASED_PROXIMAL_VARIANTS_VCF] [-c MINIMUM_FOLD_CHANGE]
[--allele-specific-anchors]
[--anchor-contribution-threshold ANCHOR_CONTRIBUTION_THRESHOLD]
[--expn-val EXPN_VAL] [--run-ml-predictions]
[--ml-threshold-accept ML_THRESHOLD_ACCEPT]
[--ml-threshold-reject ML_THRESHOLD_REJECT]
input_file sample_name allele
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
[{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
output_dir
Run the pVACseq pipeline
positional arguments:
input_file A VEP-annotated single- or multi-sample VCF containing
genotype, transcript, Wildtype protein sequence, and
Frameshift protein sequence information.The VCF may be
gzipped (requires tabix index).
sample_name The name of the tumor sample being processed. When
processing a multi-sample VCF the sample name must be
a sample ID in the input VCF #CHROM header line.
allele Name of the allele to use for epitope prediction.
Multiple alleles can be specified using a comma-
separated list. For a list of available alleles, use:
`pvacseq valid_alleles`.
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,MixMHCpred,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PRIME,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
The epitope prediction algorithms to use. Multiple
prediction algorithms can be specified, separated by
spaces.
output_dir The directory for writing all result files.
optional arguments:
-h, --help show this help message and exit
--iedb-install-directory IEDB_INSTALL_DIRECTORY
Directory that contains the local installation of IEDB
MHC I and/or MHC II. (default: None)
-r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
Number of retries when making requests to the IEDB
RESTful web interface. Must be less than or equal to
100. (default: 5)
-k, --keep-tmp-files Keep intermediate output files. This might be useful
for debugging purposes. (default: False)
-t N_THREADS, --n-threads N_THREADS
Number of threads to use for parallelizing peptide-MHC
binding prediction calls. (default: 1)
--netmhciipan-version {4.3,4.2,4.1,4.0}
Specify the version of NetMHCIIpan or NetMHCIIpanEL to
be used during the run. (default: 4.1)
--use-normalized-percentiles
When set, calculate normalized percentile scores for
all prediction algorithms. For algorithms that do not
natively provide percentiles, percentiles will be
derived by comparing prediction scores against pre-
computed reference distributions. For algorithms that
do provide native percentiles, their values will be
overwritten with the normalized percentile. (default:
False)
--reference-scores-path REFERENCE_SCORES_PATH
Directory to store pre-computed reference percentile
files. If a file is missing, it will be downloaded
here when --use-normalized-percentiles is set.
(default: /tmp)
-e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
Length of MHC Class I subpeptides (neoepitopes) to
predict. Multiple epitope lengths can be specified
using a comma-separated list. Typical epitope lengths
vary between 8-15. Required for Class I prediction
algorithms. (default: [8, 9, 10, 11])
-e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
Length of MHC Class II subpeptides (neoepitopes) to
predict. Multiple epitope lengths can be specified
using a comma-separated list. Typical epitope lengths
vary between 11-30. Required for Class II prediction
algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
-b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
Report only epitopes where the mutant allele has ic50
binding scores below this value. (default: 500)
--binding-percentile-threshold BINDING_PERCENTILE_THRESHOLD
Report only epitopes where the mutant allele has a
binding percentile rank below this value. (default:
2.0)
--immunogenicity-percentile-threshold IMMUNOGENICITY_PERCENTILE_THRESHOLD
Report only epitopes where the mutant allele has a
immunogenicity percentile rank below this value.
(default: 2.0)
--presentation-percentile-threshold PRESENTATION_PERCENTILE_THRESHOLD
Report only epitopes where the mutant allele has a
presentation percentile rank below this value.
(default: 2.0)
--percentile-threshold-strategy {conservative,exploratory}
Specify the candidate inclusion strategy. The
'conservative' option requires a candidate to pass
BOTH the binding threshold and all percentile
thresholds set (default). The 'exploratory' option
requires a candidate to pass EITHER the binding
threshold or any of the percentile thresholds set.
(default: conservative)
--allele-specific-binding-thresholds
Use allele-specific binding thresholds. To print the
allele-specific binding thresholds run `pvacseq
allele_specific_cutoffs`. If an allele does not have a
special threshold value, the `--binding-threshold`
value will be used. (default: False)
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use when filtering epitopes
by binding-threshold or minimum fold change. lowest:
Use the best MT Score and Corresponding Fold Change
(i.e. the lowest MT ic50 binding score and
corresponding fold change of all chosen prediction
methods). median: Use the median MT Score and Median
Fold Change (i.e. the median MT ic50 binding score and
fold change of all chosen prediction methods).
(default: median)
-m2 TOP_SCORE_METRIC2, --top-score-metric2 TOP_SCORE_METRIC2
Which metrics to consider when selecting the best
peptide in the aggregate erport and the top score
filter step (filtered report). Each specified metric
will be ranked and the sum of these ranks will be
used. This rank sum is also used as the primary
sorting criteria in the aggregated report for the
candidates within each tier as well as in the filtered
report. Whether the lowest or median is considered for
each metric is controlled by the --top-score-metric
parameter. (default: ['ic50', 'combined_percentile'])
--pass-only Only process VCF entries with a PASS status. (default:
False)
--normal-sample-name NORMAL_SAMPLE_NAME
In a multi-sample VCF, the name of the matched normal
sample. (default: None)
--normal-cov NORMAL_COV
Normal Coverage Cutoff. Only sites above this read
depth cutoff will be considered. (default: 5)
--tdna-cov TDNA_COV Tumor DNA Coverage Cutoff. Only sites above this read
depth cutoff will be considered. (default: 10)
--trna-cov TRNA_COV Tumor RNA Coverage Cutoff. Only sites above this read
depth cutoff will be considered. (default: 10)
--normal-vaf NORMAL_VAF
Normal VAF Cutoff in decimal format. Only sites BELOW
this cutoff in normal will be considered. (default:
0.02)
--tdna-vaf TDNA_VAF Tumor DNA VAF Cutoff in decimal format. Only sites
above this cutoff will be considered. (default: 0.25)
--trna-vaf TRNA_VAF Tumor RNA VAF Cutoff in decimal format. Only sites
above this cutoff will be considered. (default: 0.25)
--tumor-purity TUMOR_PURITY
Value between 0 and 1 indicating the fraction of tumor
cells in the tumor sample. Information is used during
aggregate report creation for a simple estimation of
whether variants are subclonal or clonal based on VAF.
If not provided, purity is estimated directly from the
VAFs. (default: None)
--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY
Specify the criteria to consider when prioritizing or
filtering transcripts of the neoantigen candidates
during aggregate report creation or TSL filtering.
'canonical' will prioritize/select candidates
resulting from variants on a Ensembl canonical
transcript. 'mane_select' will prioritize/select
candidates resulting from variants on a MANE select
transcript. 'tsl' will prioritize/select candidates
where the transcript support level (TSL) matches the
--maximum-transcript-support-level. When selecting
more than one criteria, a transcript meeting EITHER of
the selected criteria will be prioritized/selected.
(default: ['canonical', 'mane_select', 'tsl'])
--maximum-transcript-support-level {1,2,3,4,5}
The threshold to use for filtering epitopes on the
Ensembl transcript support level (TSL). Keep all
epitopes with a transcript support level <= to this
cutoff. (default: 1)
--biotypes BIOTYPES A list of biotypes to use for pre-filtering
transcripts for processing in the pipeline. (default:
['protein_coding'])
--allow-incomplete-transcripts
By default, transcripts annotated with incomplete CDS
(i.e., 'cds_start_NF' or 'cds_end_NF' flags in the VEP
CSQ field) are excluded from analysis, as they often
produce invalid protein sequences. Use this flag to
allow candidates from such transcripts. Only peptides
that do not contain 'X' will be included. These
candidates will be deprioritized relative to those
from transcripts without incomplete CDS flags.
(default: False)
--net-chop-method {cterm,20s}
NetChop prediction method to use ("cterm" for C term
3.0, "20s" for 20S 3.0). C-term 3.0 is trained with
publicly available MHC class I ligands and the authors
believe that is performs best in predicting the
boundaries of CTL epitopes. 20S is trained with in
vitro degradation data. (default: None)
--netmhc-stab Run NetMHCStabPan after all filtering and add
stability predictions to predicted epitopes. (default:
False)
--net-chop-threshold NET_CHOP_THRESHOLD
NetChop prediction threshold (increasing the threshold
results in better specificity, but worse sensitivity).
(default: 0.5)
--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS
A list of amino acids to consider as problematic. Each
entry can be specified in the following format:
`amino_acid(s)`: One or more one-letter amino acid
codes. Any occurrence of this amino acid string,
regardless of the position in the epitope, is
problematic. When specifying more than one amino acid,
they will need to occur together in the specified
order. `amino_acid:position`: A one letter amino acid
code, followed by a colon separator, followed by a
positive integer position (one-based). The occurrence
of this amino acid at the position specified is
problematic., E.g. G:2 would check for a Glycine at
the second position of the epitope. The N-terminus is
defined as position 1. `amino_acid:-position`: A one
letter amino acid code, followed by a colon separator,
followed by a negative integer position. The
occurrence of this amino acid at the specified
position from the end of the epitope is problematic.
E.g., G:-3 would check for a Glycine at the third
position from the end of the epitope. The C-terminus
is defined as position -1. (default: None)
--run-reference-proteome-similarity
Blast peptides against the reference proteome.
(default: False)
--blastp-path BLASTP_PATH
Blastp installation path. (default: None)
--blastp-db {refseq_select_prot,refseq_protein}
The blastp database to use. (default:
refseq_select_prot)
--peptide-fasta PEPTIDE_FASTA
When running the reference proteome similarity step,
use this reference peptide FASTA file to find matches
instead of blastp. (default: None)
-a {sample_name}, --additional-report-columns {sample_name}
Additional columns to output in the final report. If
sample_name is chosen, this will add a column with the
sample name in every row of the output. This can be
useful if you later want to concatenate results from
multiple individuals into a single file. (default:
None)
-s FASTA_SIZE, --fasta-size FASTA_SIZE
Number of FASTA entries per IEDB request. For some
resource-intensive prediction algorithms like
Pickpocket and NetMHCpan it might be helpful to reduce
this number. Needs to be an even number. (default:
200)
-d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
Cap to limit the downstream sequence length for
frameshifts when creating the FASTA file. Use 'full'
to include the full downstream sequence. (default:
1000)
--genes-of-interest-file GENES_OF_INTEREST_FILE
A genes of interest file. Predictions resulting from
variants on genes in this list will be marked in the
result files. The file should be formatted to have
each gene on a separate line without a header line. If
no file is specified, the Cancer Gene Census list of
high-confidence genes is used as the default.
(default: None)
--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD
Threshold for including epitopes when creating the
aggregate report (default: 5000)
--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT
Limit neoantigen candidates included in the aggregate
report to only the best n candidates per variant.
(default: 15)
-p PHASED_PROXIMAL_VARIANTS_VCF, --phased-proximal-variants-vcf PHASED_PROXIMAL_VARIANTS_VCF
A VCF with phased proximal variant information. Must
be gzipped and tabix indexed. (default: None)
-c MINIMUM_FOLD_CHANGE, --minimum-fold-change MINIMUM_FOLD_CHANGE
Minimum fold change between mutant (MT) binding score
and wild-type (WT) score (fold change = WT/MT). The
default is 0, which filters no results, but 1 is often
a sensible choice (requiring that binding is better to
the MT than WT peptide). This fold change is sometimes
referred to as a differential agretopicity index.
(default: 0.0)
--allele-specific-anchors
Use allele-specific anchor positions when tiering
epitopes in the aggregate report. This option is
available for 8, 9, 10, and 11mers and only for HLA-A,
B, and C alleles. If this option is not enabled or as
a fallback for unsupported lengths and alleles, the
default positions of 1, 2, epitope length - 1, and
epitope length are used. Please see
https://doi.org/10.1101/2020.12.08.416271 for more
details. (default: False)
--anchor-contribution-threshold ANCHOR_CONTRIBUTION_THRESHOLD
For determining allele-specific anchors, each position
is assigned a score based on how binding is influenced
by mutations. From these scores, the relative
contribution of each position to the overall binding
is calculated. Starting with the highest relative
contribution, positions whose scores together account
for the selected contribution threshold are assigned
as anchor locations. As a result, a higher threshold
leads to the inclusion of more positions to be
considered anchors. (default: 0.8)
--expn-val EXPN_VAL Gene and Transcript Expression cutoff. Only sites
above this cutoff will be considered. (default: 1.0)
--run-ml-predictions Enable ML-based neoantigen evaluation predictions.
(default: False)
--ml-threshold-accept ML_THRESHOLD_ACCEPT
Threshold for Accept predictions in ML model (default:
0.55). (default: 0.55)
--ml-threshold-reject ML_THRESHOLD_REJECT
Threshold for Reject predictions in ML model (default:
0.30). (default: 0.3)