Filtering Commands¶
pVACsplice currently offers four filters: a binding filter, a coverage filter, a transcript support level filter, and a top score filter.
These filters are always run automatically as part of the pVACsplice pipeline using default cutoffs.
All filters can also be run manually on the filtered.tsv file to narrow the results down further, or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.
The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria. The coverage filter is used to remove variants that do not meet desired read count and VAF criteria (in normal DNA and tumor DNA/RNA). The transcript support level filter is used to remove variant annotations based on low quality transcript annotations. The top score filter is used to select the most promising peptide candidate for each variant. Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles, and transcript annotations.
Further details on each of these filters is provided below.
Note
The default values for filtering thresholds are suggestions only. While they are based on review of the literature and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.
Binding Filter¶
usage: pvacsplice binding_filter [-h] [-b BINDING_THRESHOLD]
[-p PERCENTILE_THRESHOLD]
[-m {lowest,median}] [--exclude-NAs] [-a]
input_file output_file
Filter variants processed by IEDB by binding score.
positional arguments:
input_file The all_epitopes.tsv or filtered.tsv pVACseq report
file to filter.
output_file Output .tsv file containing list of filtered epitopes
based on binding affinity.
optional arguments:
-h, --help show this help message and exit
-b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
Report only epitopes where the mutant allele has ic50
binding scores below this value. (default: 500)
-p PERCENTILE_THRESHOLD, --percentile-threshold PERCENTILE_THRESHOLD
Report only epitopes where the mutant allele has a
percentile rank below this value. (default: None)
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use when filtering epitopes
by binding-threshold or minimum fold change. lowest:
Use the Best MT IC50 Score, Corresponding Fold Change,
and Best MT Percentile (i.e. use the lowest MT ic50
binding score, orresponding fold change of all chosen
prediction methods, and lowest MT percentile). median:
Use the Median MT IC50 Score, Median Fold Change, and
Median MT Percentile i.e. use the median MT ic50
binding score, fold change, and MT percentile of all
chosen prediction methods). (default: median)
--exclude-NAs Exclude NA values from the filtered output. (default:
False)
-a, --allele-specific-binding-thresholds
Use allele-specific binding thresholds. To print the
allele-specific binding thresholds run `pvacsplice
allele_specific_cutoffs`. If an allele does not have a
special threshold value, the `--binding-threshold`
value will be used. (default: False)
The binding filter removes variants that don’t pass the chosen binding threshold.
The user can chose whether to apply this filter to the lowest
or the median
binding
affinity score by setting the --top-score-metric
flag. The lowest
binding
affinity score is recorded in the Best MT IC50 Score
column and represents the lowest
ic50 score of all prediction algorithms that were picked during the previous pVACseq run.
The median
binding affinity score is recorded in the Median MT IC50 Score
column and
corresponds to the median ic50 score of all prediction algorithms used to create the report.
Be default, the binding filter runs on the median
binding affinity.
When the --allele-specific-binding-thresholds
flag is set, binding cutoffs specific to each
prediction’s HLA allele are used instead of the value set via the --binding-threshold
parameters.
For HLA alleles where no allele-specific binding threshold is available, the
binding threshold is used as a fallback. Alleles with allele-specific
threshold as well as the value of those thresholds can be printed by executing
the pvacsplice allele_specific_cutoffs
command.
In addition to being able to filter on the IC50 score columns, the binding
filter also offers the ability to filter on the percentile score using the
--percentile-threshold
parameter. When the --top-score-metric
is set
to lowest
, this threshold is applied to the Best MT Percentile
column. When
it is set to median
, the threshold is applied to the Median MT
Percentile
column.
By default, entries with NA
values will be included in the output. This
behavior can be turned off by using the --exclude-NAs
flag.
Coverage Filter¶
usage: pvacsplice coverage_filter [-h] [--normal-cov NORMAL_COV]
[--tdna-cov TDNA_COV] [--trna-cov TRNA_COV]
[--normal-vaf NORMAL_VAF]
[--tdna-vaf TDNA_VAF] [--trna-vaf TRNA_VAF]
[--expn-val EXPN_VAL] [--exclude-NAs]
input_file output_file
Filter variants processed by IEDB by coverage, vaf, and gene expression
positional arguments:
input_file The all_epitopes.tsv or filtered.tsv pVACsplice report
file to filter.
output_file Output .tsv file containing list of filtered epitopes
based on coverage and expression values
optional arguments:
-h, --help show this help message and exit
--normal-cov NORMAL_COV
Normal Coverage Cutoff. Sites above this cutoff will
be considered. (default: 5)
--tdna-cov TDNA_COV Tumor DNA Coverage Cutoff. Sites above this cutoff
will be considered. (default: 10)
--trna-cov TRNA_COV Tumor RNA Coverage Cutoff. Sites above this cutoff
will be considered. (default: 10)
--normal-vaf NORMAL_VAF
Normal VAF Cutoff in decimal format. Sites BELOW this
cutoff in normal will be considered. (default: 0.02)
--tdna-vaf TDNA_VAF Tumor DNA VAF Cutoff in decimal format. Sites above
this cutoff will be considered. (default: 0.25)
--trna-vaf TRNA_VAF Tumor RNA VAF Cutoff in decimal format. Sites above
this cutoff will be considered. (default: 0.25)
--expn-val EXPN_VAL Gene and Transcript Expression cutoff. Sites above
this cutoff will be considered. (default: 1.0)
--exclude-NAs Exclude NA values from the filtered output. (default:
False)
If the pVACsplice input VCF contains readcount and/or expression annotations, then the coverage filter can be run again on the filtered.tsv report file to narrow down the results even further. You can also run this filter again on the all_epitopes.tsv report file to apply different cutoffs.
The general goals of these filters are to limit variants for neoepitope prediction to those with good read support and/or remove possible sub-clonal variants. In some cases the input VCF may have already been filtered in this fashion. This filter also allows for removal of variants that do not have sufficient evidence of RNA expression.
For more details on how to prepare input VCFs that contain all of these annotations, refer to the Input File Preparation section for more information.
By default, entries with NA
values will be included in the output. This
behavior can be turned off by using the --exclude-NAs
flag.
Transcript Support Level Filter¶
usage: pvacsplice transcript_support_level_filter [-h]
[--maximum-transcript-support-level {1,2,3,4,5}]
input_file output_file
Filter variants processed by IEDB by transcript support level
positional arguments:
input_file The all_epitopes.tsv or filtered.tsv pVACsplice report
file to filter.
output_file Output .tsv file containting list of of filtered
epitopes based on transcript support level.
optional arguments:
-h, --help show this help message and exit
--maximum-transcript-support-level {1,2,3,4,5}
The threshold to use for filtering epitopes on the
transcript support level. Keep all epitopes with a
transcript support level <= to this cutoff. (default:
1)
This filter is used to eliminate variant annotations based on poorly-supported transcripts. By default,
only transcripts with a transcript support level (TSL)
of <=1 are kept. This threshold can be adjusted using the --maximum-transcript-support-level
parameter.
By default, entries with Not Supported
values will be included in the output. These occur if VEP was run
without the --tsl
flag or if data is aligned to GRCh37 or older.
Top Score Filter¶
usage: pvacsplice top_score_filter [-h] [-m {lowest,median}]
[--maximum-transcript-support-level {1,2,3,4,5}]
input_file output_file
Pick the best neoepitope for each variant
positional arguments:
input_file The final report .tsv file to filter.
output_file Output .tsv file containing only the list of the top
epitope per variant.
optional arguments:
-h, --help show this help message and exit
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use for filtering. lowest:
Use the best MT Score (i.e. the lowest MT ic50 binding
score of all chosen prediction methods). median: Use
the median MT Score (i.e. the median MT ic50 binding
score of all chosen prediction methods). (default:
median)
--maximum-transcript-support-level {1,2,3,4,5}
When determining the top peptide, only consider those
entries that meet this threshold for the Ensembl
transcript support level (TSL). Transcript support
level needs to be <= this cutoff to be considered.
(default: 1)
This filter picks the top epitope for each splice site variant. The top epitope is determined by first selecting epitopes with no Problematic Positions and among those selecting the one with lowest median/best MT IC50 score for each splice site variant
By default the --top-score-metric
option is set to median
which will apply this
filter to the Median MT IC50 Score
column. If the --top-score-metric
option is set to lowest
, the Best MT IC50 Score
column is used
instead.