UsageΒΆ
Warning
Using a local IEDB installation is strongly recommended for larger datasets or when the making predictions for many alleles, epitope lengths, or prediction algorithms. More information on how to install IEDB locally can be found on the Installation page.
usage: pvacsplice run [-h] [--iedb-install-directory IEDB_INSTALL_DIRECTORY]
[-r IEDB_RETRIES] [-k] [-t N_THREADS]
[--netmhciipan-version {4.3,4.2,4.1,4.0}]
[-e1 CLASS_I_EPITOPE_LENGTH]
[-e2 CLASS_II_EPITOPE_LENGTH] [-b BINDING_THRESHOLD]
[--percentile-threshold PERCENTILE_THRESHOLD]
[--percentile-threshold-strategy {conservative,exploratory}]
[--allele-specific-binding-thresholds]
[-m {lowest,median}] [-m2 {ic50,percentile}]
[--pass-only] [--normal-sample-name NORMAL_SAMPLE_NAME]
[--normal-cov NORMAL_COV] [--tdna-cov TDNA_COV]
[--trna-cov TRNA_COV] [--normal-vaf NORMAL_VAF]
[--tdna-vaf TDNA_VAF] [--trna-vaf TRNA_VAF]
[--tumor-purity TUMOR_PURITY]
[--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY]
[--maximum-transcript-support-level {1,2,3,4,5}]
[--biotypes BIOTYPES] [--allow-incomplete-transcripts]
[--net-chop-method {cterm,20s}] [--netmhc-stab]
[--net-chop-threshold NET_CHOP_THRESHOLD]
[--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS]
[--run-reference-proteome-similarity]
[--blastp-path BLASTP_PATH]
[--blastp-db {refseq_select_prot,refseq_protein}]
[--peptide-fasta PEPTIDE_FASTA] [-a {sample_name}]
[-s FASTA_SIZE] [--exclude-NAs]
[--genes-of-interest-file GENES_OF_INTEREST_FILE]
[--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD]
[--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT]
[-j JUNCTION_SCORE] [-v VARIANT_DISTANCE] [-g]
[--anchor-types ANCHOR_TYPES] [--expn-val EXPN_VAL]
input_file sample_name allele
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
[{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii} ...]
output_dir annotated_vcf ref_fasta gtf_file
Run the pVACsplice pipeline
positional arguments:
input_file RegTools junctions output TSV file
sample_name The name of the sample being processed. This will be
used as a prefix for output files.
allele Name of the allele to use for epitope prediction.
Multiple alleles can be specified using a comma-
separated list. For a list of available alleles, use:
`pvacsplice valid_alleles`.
{BigMHC_EL,BigMHC_IM,DeepImmuno,MHCflurry,MHCflurryEL,MHCnuggetsI,MHCnuggetsII,NNalign,NetMHC,NetMHCIIpan,NetMHCIIpanEL,NetMHCcons,NetMHCpan,NetMHCpanEL,PickPocket,SMM,SMMPMBEC,SMMalign,all,all_class_i,all_class_ii}
The epitope prediction algorithms to use. Multiple
prediction algorithms can be specified, separated by
spaces.
output_dir The directory for writing all result files.
annotated_vcf A VEP-annotated single- or multi-sample VCF containing
genotype and transcript information.The VCF may be
gzipped (requires tabix index).
ref_fasta A reference DNA FASTA file. Note: this input should be
the same as the RegTools fasta input.
gtf_file A reference GTF file. Note: this input should be the
same as the RegTools gtf input.
optional arguments:
-h, --help show this help message and exit
--iedb-install-directory IEDB_INSTALL_DIRECTORY
Directory that contains the local installation of IEDB
MHC I and/or MHC II. (default: None)
-r IEDB_RETRIES, --iedb-retries IEDB_RETRIES
Number of retries when making requests to the IEDB
RESTful web interface. Must be less than or equal to
100. (default: 5)
-k, --keep-tmp-files Keep intermediate output files. This might be useful
for debugging purposes. (default: False)
-t N_THREADS, --n-threads N_THREADS
Number of threads to use for parallelizing peptide-MHC
binding prediction calls. (default: 1)
--netmhciipan-version {4.3,4.2,4.1,4.0}
Specify the version of NetMHCIIpan or NetMHCIIpanEL to
be used during the run. (default: 4.1)
-e1 CLASS_I_EPITOPE_LENGTH, --class-i-epitope-length CLASS_I_EPITOPE_LENGTH
Length of MHC Class I subpeptides (neoepitopes) to
predict. Multiple epitope lengths can be specified
using a comma-separated list. Typical epitope lengths
vary between 8-15. Required for Class I prediction
algorithms. (default: [8, 9, 10, 11])
-e2 CLASS_II_EPITOPE_LENGTH, --class-ii-epitope-length CLASS_II_EPITOPE_LENGTH
Length of MHC Class II subpeptides (neoepitopes) to
predict. Multiple epitope lengths can be specified
using a comma-separated list. Typical epitope lengths
vary between 11-30. Required for Class II prediction
algorithms. (default: [12, 13, 14, 15, 16, 17, 18])
-b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
When creating the filtered.tsv report, only include
epitopes where the mutant allele has ic50 binding
scores below this value. When creating the
aggreated.tsv report, only bin candidates into the
Pass tier that meet this threshold. (default: 500)
--percentile-threshold PERCENTILE_THRESHOLD
When creating the filtered.tsv report, only include
epitopes where the mutant allele has a percentile rank
below this value. When creating the aggregated.tsv
report, only bin candidates into the Pass tier that
meet this threshold. (default: None)
--percentile-threshold-strategy {conservative,exploratory}
Specify the candidate inclusion strategy. The
'conservative' option requires a candidate to pass
BOTH the binding threshold and percentile threshold
(default). The 'exploratory' option requires a
candidate to pass EITHER the binding threshold or the
percentile threshold. (default: conservative)
--allele-specific-binding-thresholds
Use allele-specific binding thresholds. To print the
allele-specific binding thresholds run `pvacsplice
allele_specific_cutoffs`. If an allele does not have a
special threshold value, the `--binding-threshold`
value will be used. (default: False)
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use when filtering epitopes
by binding-threshold or minimum fold change. lowest:
Use the best MT Score and Corresponding Fold Change
(i.e. the lowest MT ic50 binding score and
corresponding fold change of all chosen prediction
methods). median: Use the median MT Score and Median
Fold Change (i.e. the median MT ic50 binding score and
fold change of all chosen prediction methods).
(default: median)
-m2 {ic50,percentile}, --top-score-metric2 {ic50,percentile}
Whether to use median/best IC50 or to use median/best
percentile score when determining the best peptide in
the aggregated report and the top score filter
(filtered report). This parameter is also used to
influence the primary sorting criteria in the
aggregated report for the candidates within each tier
as well as in the filtered report. (default: ic50)
--pass-only Only process VCF entries with a PASS status. (default:
False)
--normal-sample-name NORMAL_SAMPLE_NAME
In a multi-sample VCF, the name of the matched normal
sample. (default: None)
--normal-cov NORMAL_COV
Normal Coverage Cutoff. When creating the filtered.tsv
report, only include epitopes with a normal read depth
above this cutoff. (default: 5)
--tdna-cov TDNA_COV Tumor DNA Coverage Cutoff. When creating the
filtered.tsv report, only include epitopes with a
tumor DNA read depth above this cutoff. (default: 10)
--trna-cov TRNA_COV Tumor RNA Coverage Cutoff. When creating the
filtered.tsv report, only include epitopes with a
tumor RNA read depth above this cutoff. (default: 10)
--normal-vaf NORMAL_VAF
Normal VAF Cutoff in decimal format. When creating the
filtered.tsv report, only include epitopes with a
normal VAF BELOW this cutoff. (default: 0.02)
--tdna-vaf TDNA_VAF Tumor DNA VAF Cutoff in decimal format. When creating
the filtered.tsv report, only include epitopes with a
tumor DNA VAF above this cutoff. When creating the
aggregated.tsv report, use this cutoff to determine if
a candidate is subclonal or not. Only clonal
candidates will be binned into the Pass tier.
(default: 0.25)
--trna-vaf TRNA_VAF Tumor RNA VAF Cutoff in decimal format. When creating
the filtered.tsv report, only include epitopes with a
tumor RNA VAF above this cutoff. This parameter is
also used in combination with the --expn-val cutoff
for tiering candidates when creating the
aggregated.tsv report. This allele expression cutoff
is calculated as --trna-vaf * --expn-val * 10. Only
candidates with Allele Expr (RNA Expr * RNA VAF) above
the allele expr cutoff will be binned into the Pass
tier. (default: 0.25)
--tumor-purity TUMOR_PURITY
Value between 0 and 1 indicating the fraction of tumor
cells in the tumor sample. Information is used during
aggregate report creation for a simple estimation of
whether variants are subclonal or clonal based on VAF.
If not provided, purity is estimated directly from the
VAFs. (default: None)
--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY
Specify the criteria to consider when prioritizing and
tiering candidates during aggregate report creation or
filtering during Transcript filtering when creating
the filtered.tsv file. 'canonical' will
prioritize/select candidates resulting from variants
on a Ensembl canonical transcript. 'mane_select' will
prioritize/select candidates resulting from variants
on a MANE select transcript. 'tsl' will
prioritize/select candidates where the transcript
support level (TSL) matches the --maximum-transcript-
support-level. When selecting more than one criteria,
a transcript meeting EITHER of the selected criteria
will be prioritized/selected. Only if the best
transcript of a candidate passes EITHER of the
selected criteria will the candidate be binned into
the Pass tier. (default: ['canonical', 'mane_select',
'tsl'])
--maximum-transcript-support-level {1,2,3,4,5}
The threshold to use for filtering epitopes on the
Ensembl transcript support level (TSL). Epitopes with
a transcript support level <= to this cutoff will be a
considered a good transcript if 'tsl' is one of the
selected transcript prioritization strategy options.
(default: 1)
--biotypes BIOTYPES A list of biotypes to use for pre-filtering
transcripts for processing in the pipeline. (default:
['protein_coding'])
--allow-incomplete-transcripts
By default, transcripts annotated with incomplete CDS
(i.e., 'cds_start_NF' or 'cds_end_NF' flags in the VEP
CSQ field) are excluded from analysis, as they often
produce invalid protein sequences. Use this flag to
allow candidates from such transcripts. Only peptides
that do not contain 'X' will be included. These
candidates will be deprioritized relative to those
from transcripts without incomplete CDS flags.
(default: False)
--net-chop-method {cterm,20s}
NetChop prediction method to use ("cterm" for C term
3.0, "20s" for 20S 3.0). C-term 3.0 is trained with
publicly available MHC class I ligands and the authors
believe that is performs best in predicting the
boundaries of CTL epitopes. 20S is trained with in
vitro degradation data. (default: None)
--netmhc-stab Run NetMHCStabPan after all filtering and add
stability predictions to predicted epitopes. (default:
False)
--net-chop-threshold NET_CHOP_THRESHOLD
NetChop prediction threshold (increasing the threshold
results in better specificity, but worse sensitivity).
(default: 0.5)
--problematic-amino-acids PROBLEMATIC_AMINO_ACIDS
A list of amino acids to consider as problematic.
During aggregate report creation, only candidates
without problematic positions will be binned into the
Pass tier. Each entry can be specified in the
following format: `amino_acid(s)`: One or more one-
letter amino acid codes. Any occurrence of this amino
acid string, regardless of the position in the
epitope, is problematic. When specifying more than one
amino acid, they will need to occur together in the
specified order. `amino_acid:position`: A one letter
amino acid code, followed by a colon separator,
followed by a positive integer position (one-based).
The occurrence of this amino acid at the position
specified is problematic., E.g. G:2 would check for a
Glycine at the second position of the epitope. The
N-terminus is defined as position 1.
`amino_acid:-position`: A one letter amino acid code,
followed by a colon separator, followed by a negative
integer position. The occurrence of this amino acid at
the specified position from the end of the epitope is
problematic. E.g., G:-3 would check for a Glycine at
the third position from the end of the epitope. The
C-terminus is defined as position -1. (default: None)
--run-reference-proteome-similarity
Blast peptides against the reference proteome or
search for peptides in a reference proteome fasta
file. During aggregate report creation, only
candidates without a reference proteome match will be
binned into the Pass tier. (default: False)
--blastp-path BLASTP_PATH
Blastp installation path. (default: None)
--blastp-db {refseq_select_prot,refseq_protein}
The blastp database to use. (default:
refseq_select_prot)
--peptide-fasta PEPTIDE_FASTA
When running the reference proteome similarity step,
use this reference peptide FASTA file to find matches
instead of blastp. (default: None)
-a {sample_name}, --additional-report-columns {sample_name}
Additional columns to output in the final report. If
sample_name is chosen, this will add a column with the
sample name in every row of the output. This can be
useful if you later want to concatenate results from
multiple individuals into a single file. (default:
None)
-s FASTA_SIZE, --fasta-size FASTA_SIZE
Number of FASTA entries per IEDB request. For some
resource-intensive prediction algorithms like
Pickpocket and NetMHCpan it might be helpful to reduce
this number. Needs to be an even number. (default:
200)
--exclude-NAs Exclude NA values from the filtered output. (default:
False)
--genes-of-interest-file GENES_OF_INTEREST_FILE
A genes of interest file. Predictions resulting from
variants on genes in this list will be marked in the
result files. The file should be formatted to have
each gene on a separate line without a header line. If
no file is specified, the Cancer Gene Census list of
high-confidence genes is used as the default.
(default: None)
--aggregate-inclusion-binding-threshold AGGREGATE_INCLUSION_BINDING_THRESHOLD
Threshold for including epitopes when creating the
aggregate report (default: 5000)
--aggregate-inclusion-count-limit AGGREGATE_INCLUSION_COUNT_LIMIT
Limit neoantigen candidates included in the aggregate
report to only the best n candidates per variant.
(default: 15)
-j JUNCTION_SCORE, --junction-score JUNCTION_SCORE
Junction Coverage Cutoff. Only sites above this read
depth cutoff will be considered. (default: 10)
-v VARIANT_DISTANCE, --variant-distance VARIANT_DISTANCE
Regulatory variants can lie inside or outside of
splicing junction.Maximum distance window (upstream
and downstream) for a variant outside the junction.
(default: 100)
-g, --save-gtf Save a tsv file from the uploaded filtered GTF
data.Use this option to bypass GTF data upload time
for multiple pVACsplice runs. (default: False)
--anchor-types ANCHOR_TYPES
The anchor types of junctions to use. Multiple anchors
can be specified using a comma-separated list.Choices:
A, D, NDA (default: ['A', 'D', 'NDA'])
--expn-val EXPN_VAL Expression Cutoff. When creating the filtered.tsv
report, only include epitopes with expression above
this value. When creating the aggregated.tsv report,
only bin candidates into the Pass tier that meet this
threshold. (default: 1.0)