pVACsplice logo

Filtering Commands

pVACsplice currently offers four filters: a binding filter, a coverage filter, a transcript filter, and a top score filter.

These filters are always run automatically as part of the pVACsplice pipeline using default cutoffs.

All filters can also be run manually on the filtered.tsv file to narrow the results down further, or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.

The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria. The coverage filter is used to remove variants that do not meet desired read count and VAF criteria (in normal DNA and tumor DNA/RNA). The transcript filter is used to remove variant annotations based on low quality transcript annotations. The top score filter is used to select the most promising peptide candidate for each variant. Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles, and transcript annotations.

Further details on each of these filters is provided below.

Note

The default values for filtering thresholds are suggestions only. While they are based on review of the literature and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.

Binding Filter

usage: pvacsplice binding_filter [-h] [-b BINDING_THRESHOLD]
                                 [-p PERCENTILE_THRESHOLD]
                                 [--percentile-threshold-strategy {conservative,exploratory}]
                                 [-m {lowest,median}] [--exclude-NAs] [-a]
                                 input_file output_file

Filter variants processed by IEDB by binding score.

positional arguments:
  input_file            The all_epitopes.tsv or filtered.tsv pVACseq report
                        file to filter.
  output_file           Output .tsv file containing list of filtered epitopes
                        based on binding affinity.

optional arguments:
  -h, --help            show this help message and exit
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        Report only epitopes where the mutant allele has ic50
                        binding scores below this value. (default: 500)
  -p PERCENTILE_THRESHOLD, --percentile-threshold PERCENTILE_THRESHOLD
                        Report only epitopes where the mutant allele has a
                        percentile rank below this value. (default: None)
  --percentile-threshold-strategy {conservative,exploratory}
                        Specify the candidate inclusion strategy. The
                        'conservative' option requires a candidate to pass
                        BOTH the binding threshold and percentile threshold
                        (default). The 'exploratory' option requires a
                        candidate to pass EITHER the binding threshold or the
                        percentile threshold. (default: conservative)
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when filtering epitopes
                        by binding-threshold or minimum fold change. lowest:
                        Use the Best MT IC50 Score, Corresponding Fold Change,
                        and Best MT Percentile (i.e. use the lowest MT ic50
                        binding score, orresponding fold change of all chosen
                        prediction methods, and lowest MT percentile). median:
                        Use the Median MT IC50 Score, Median Fold Change, and
                        Median MT Percentile i.e. use the median MT ic50
                        binding score, fold change, and MT percentile of all
                        chosen prediction methods). (default: median)
  --exclude-NAs         Exclude NA values from the filtered output. (default:
                        False)
  -a, --allele-specific-binding-thresholds
                        Use allele-specific binding thresholds. To print the
                        allele-specific binding thresholds run `pvacsplice
                        allele_specific_cutoffs`. If an allele does not have a
                        special threshold value, the `--binding-threshold`
                        value will be used. (default: False)

The binding filter removes variants that don’t pass the chosen binding threshold. The user can chose whether to apply this filter to the lowest or the median binding affinity score by setting the --top-score-metric flag. The lowest binding affinity score is recorded in the Best MT IC50 Score column and represents the lowest ic50 score of all prediction algorithms that were picked during the previous pVACseq run. The median binding affinity score is recorded in the Median MT IC50 Score column and corresponds to the median ic50 score of all prediction algorithms used to create the report. Be default, the binding filter runs on the median binding affinity. An additional --top-score-metric2 flag allows the user to choose whether to use IC50 or Percentile scores. By default, IC50 is used.

When the --allele-specific-binding-thresholds flag is set, binding cutoffs specific to each prediction’s HLA allele are used instead of the value set via the --binding-threshold parameters. For HLA alleles where no allele-specific binding threshold is available, the binding threshold is used as a fallback. Alleles with allele-specific threshold as well as the value of those thresholds can be printed by executing the pvacsplice allele_specific_cutoffs command.

In addition to being able to filter on the IC50 score columns, the binding filter also offers the ability to filter on the percentile score using the --percentile-threshold parameter. When the --top-score-metric is set to lowest, this threshold is applied to the Best MT Percentile column. When it is set to median, the threshold is applied to the Median MT Percentile column.

When the --percentile-threshold flag is set, the candidate inclusion strategy can be specified by using the --percentile-threshold-strategy parameter. The parameter has two options conservative (default) and exploratory. The ‘conservative’ option requires a candidate to pass BOTH the binding threshold and percentile threshold, while the ‘exploratory’ option requires a candidate to pass EITHER the binding threshold or percentile threshold.

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Coverage Filter

usage: pvacsplice coverage_filter [-h] [--normal-cov NORMAL_COV]
                                  [--tdna-cov TDNA_COV] [--trna-cov TRNA_COV]
                                  [--normal-vaf NORMAL_VAF]
                                  [--tdna-vaf TDNA_VAF] [--trna-vaf TRNA_VAF]
                                  [--expn-val EXPN_VAL] [--exclude-NAs]
                                  input_file output_file

Filter variants processed by IEDB by coverage, vaf, and gene expression

positional arguments:
  input_file            The all_epitopes.tsv or filtered.tsv pVACsplice report
                        file to filter.
  output_file           Output .tsv file containing list of filtered epitopes
                        based on coverage and expression values

optional arguments:
  -h, --help            show this help message and exit
  --normal-cov NORMAL_COV
                        Normal Coverage Cutoff. Sites above this cutoff will
                        be considered. (default: 5)
  --tdna-cov TDNA_COV   Tumor DNA Coverage Cutoff. Sites above this cutoff
                        will be considered. (default: 10)
  --trna-cov TRNA_COV   Tumor RNA Coverage Cutoff. Sites above this cutoff
                        will be considered. (default: 10)
  --normal-vaf NORMAL_VAF
                        Normal VAF Cutoff in decimal format. Sites BELOW this
                        cutoff in normal will be considered. (default: 0.02)
  --tdna-vaf TDNA_VAF   Tumor DNA VAF Cutoff in decimal format. Sites above
                        this cutoff will be considered. (default: 0.25)
  --trna-vaf TRNA_VAF   Tumor RNA VAF Cutoff in decimal format. Sites above
                        this cutoff will be considered. (default: 0.25)
  --expn-val EXPN_VAL   Gene and Transcript Expression cutoff. Sites above
                        this cutoff will be considered. (default: 1.0)
  --exclude-NAs         Exclude NA values from the filtered output. (default:
                        False)

If the pVACsplice input VCF contains readcount and/or expression annotations, then the coverage filter can be run again on the filtered.tsv report file to narrow down the results even further. You can also run this filter again on the all_epitopes.tsv report file to apply different cutoffs.

The general goals of these filters are to limit variants for neoepitope prediction to those with good read support and/or remove possible sub-clonal variants. In some cases the input VCF may have already been filtered in this fashion. This filter also allows for removal of variants that do not have sufficient evidence of RNA expression.

For more details on how to prepare input VCFs that contain all of these annotations, refer to the Input File Preparation section for more information.

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Transcript Filter

usage: pvacsplice transcript_filter [-h]
                                    [--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY]
                                    [--maximum-transcript-support-level {1,2,3,4,5}]
                                    input_file output_file

Filter variant transcripts processed by IEDB.

positional arguments:
  input_file            The all_epitopes.tsv or filtered.tsv report file to
                        filter.
  output_file           Output .tsv file containing list of filtered epitopes
                        based on the variant transcript.

optional arguments:
  -h, --help            show this help message and exit
  --transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY
                        Specify the criteria to consider when filtering
                        transcripts of the neoantigen candidates. 'canonical'
                        will select candidates resulting from variants on a
                        Ensembl canonical transcript. 'mane_select' will
                        select candidates resulting from variants on a MANE
                        select transcript. 'tsl' will select candidates where
                        the transcript support level (TSL) matches the
                        --maximum-transcript-support-level cutoff. When
                        selecting more than one criteria, a transcript meeting
                        EITHER of the selected criteria will be selected.
                        (default: ['canonical', 'mane_select', 'tsl'])
  --maximum-transcript-support-level {1,2,3,4,5}
                        The threshold to use for filtering epitopes on the
                        Ensembl transcript support level (TSL). Keep all
                        epitopes with a transcript support level <= to this
                        cutoff. (default: 1)

This filter is used to eliminate variant annotations based on poorly-supported transcripts. This assessed based on whether the transcript is the MANE Select transcripts, whether it is the canonical transcript or whether the transcript support level (TSL) meets the --maximum-transcript-support-level cutoff. The --transcript-prioritizatio-strategy parameter controlls which ones of these three criteria are considered. A neoantigen candidate passes this filter if its transcript passes at least one of the specified criteria.

Transcript with a TSL of Not Supported will pass the TSL criteria. These values occur if VEP was run without the --tsl flag or if data is aligned to GRCh37 or older.

Top Score Filter

usage: pvacsplice top_score_filter [-h] [-m {lowest,median}]
                                   [-m2 {ic50,percentile}]
                                   [--transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY]
                                   [--maximum-transcript-support-level {1,2,3,4,5}]
                                   input_file output_file

Pick the best neoepitope for each variant

positional arguments:
  input_file            The final report .tsv file to filter.
  output_file           Output .tsv file containing only the list of the top
                        epitope per variant.

optional arguments:
  -h, --help            show this help message and exit
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use for filtering. lowest:
                        Use the best MT Score (i.e. the lowest MT ic50 binding
                        score of all chosen prediction methods). median: Use
                        the median MT Score (i.e. the median MT ic50 binding
                        score of all chosen prediction methods). (default:
                        median)
  -m2 {ic50,percentile}, --top-score-metric2 {ic50,percentile}
                        Whether to use median/best IC50 or to use median/best
                        percentile score when determining the top scoring
                        peptide. This parameter is also used to influence the
                        primary sorting criteria for the variants in the
                        output report. (default: ic50)
  --transcript-prioritization-strategy TRANSCRIPT_PRIORITIZATION_STRATEGY
                        Specify the criteria to consider when filtering
                        transcripts of the neoantigen candidates. 'canonical'
                        will select candidates resulting from variants on a
                        Ensembl canonical transcript. 'mane_select' will
                        select candidates resulting from variants on a MANE
                        select transcript. 'tsl' will select candidates where
                        the transcript support level (TSL) matches the
                        --maximum-transcript-support-level cutoff. When
                        selecting more than one criteria, a transcript meeting
                        EITHER of the selected criteria will be selected.
                        (default: ['canonical', 'mane_select', 'tsl'])
  --maximum-transcript-support-level {1,2,3,4,5}
                        When determining the top peptide, only consider those
                        entries that meet this threshold for the Ensembl
                        transcript support level (TSL). Transcript support
                        level needs to be <= this cutoff to be considered.
                        (default: 1)

This filter picks the top epitope for each junction according to the following criteria:

  • If --allow-inclomplete-transcripts flag is set, pick the entries without a Transcript CDS Flags set.

  • Of the remaining entries, pick the entries where the Biotype is protein_coding.

  • Of the remaining entries, pick the entries that pass at least one of the transcript criteria selected in the --transcript-prioritization-strategy taking into consideration the --maximum-transcript-support-level if tsl is one of the selected criteria.

  • Of the remaining entries, pick the entries with no Problematic Positions.

  • Sort the remaining entries by lowest Median|Best IC50 Score|Percentile (depending on the selected --top-score-metric and --top-score-metric2), MANE Select (True), Canonical (True), Transcript Support Level, WT Protein Length, and Transcript Expression. Select the highest sorted entry.

Aggregate Report Filter

usage: pvacsplice aggregate_report_filter [-h] [--include-tiers INCLUDE_TIERS]
                                          input_file output_file

Filter an aggregate report based on the variant Tier.

positional arguments:
  input_file            The aggregated.tsv report file to filter.
  output_file           Output aggregated.tsv file containing list of filtered
                        aggregate report entries based on the selected variant
                        Tier.

optional arguments:
  -h, --help            show this help message and exit
  --include-tiers INCLUDE_TIERS
                        Specify a comma-separated list of tiers for which to
                        retain aggregate report variant data. (default:
                        ['Pass'])

This command filters the aggregate report to only those variants matching the specified --include-tiers (default:Pass).