pVACfuse logo

Filtering Commands

pVACfuse currently offers three filters: a binding filter, a coverage filter, and a top score filter.

All filters are run automatically as part of the pVACfuse pipeline.

All filters can also be run manually to narrow the final results down further or to redefine the filters entirely and produce a new candidate list from the all_epitopes.tsv file.

Note

The default values for filtering thresholds are suggestions only. While they are based on review of the literature and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.

Binding Filter

usage: pvacfuse binding_filter [-h] [-b BINDING_THRESHOLD]
                               [-p PERCENTILE_THRESHOLD] [-m {lowest,median}]
                               [--exclude-NAs] [-a]
                               input_file output_file

Filter variants processed by IEDB by binding score.

positional arguments:
  input_file            The final report .tsv file to filter.
  output_file           Output .tsv file containing list of filtered epitopes
                        based on binding affinity.

optional arguments:
  -h, --help            show this help message and exit
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        Report only epitopes where the mutant allele has ic50
                        binding scores below this value. (default: 500)
  -p PERCENTILE_THRESHOLD, --percentile-threshold PERCENTILE_THRESHOLD
                        Report only epitopes where the mutant allele has a
                        percentile rank below this value. (default: None)
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when filtering epitopes
                        by binding-threshold or minimum fold change. lowest:
                        Use the Best MT IC50 Score, Corresponding Fold Change,
                        and Best MT Percentile (i.e. use the lowest MT ic50
                        binding score, orresponding fold change of all chosen
                        prediction methods, and lowest MT percentile). median:
                        Use the Median MT IC50 Score, Median Fold Change, and
                        Median MT Percentile i.e. use the median MT ic50
                        binding score, fold change, and MT percentile of all
                        chosen prediction methods). (default: median)
  --exclude-NAs         Exclude NA values from the filtered output. (default:
                        False)
  -a, --allele-specific-binding-thresholds
                        Use allele-specific binding thresholds. To print the
                        allele-specific binding thresholds run `pvacfuse
                        allele_specific_cutoffs`. If an allele does not have a
                        special threshold value, the `--binding-threshold`
                        value will be used. (default: False)

The binding filter filters out variants that don’t pass the chosen binding threshold. The user can chose whether to apply this filter to the lowest or the median binding affinity score by setting the --top-score-metric flag. The lowest binding affinity score is recorded in the Best IC50 Score column and represents the lowest ic50 score of all prediction algorithms that were picked during the previous pVACseq run. The median binding affinity score is recorded in the Median IC50 Score column and corresponds to the median ic50 score of all prediction algorithms used to create the report. Be default, the binding filter runs on the median binding affinity.

When the --allele-specific-binding-thresholds flag is set, binding cutoffs specific to each prediction’s HLA allele are used instead of the value set via the --binding-threshold parameters. For HLA alleles where no allele-specific binding threshold is available, the binding threshold is used as a fallback. Alleles with allele-specific threshold as well as the value of those thresholds can be printed by executing the pvacfuse allele_specific_cutoffs command.

In addition to being able to filter on the IC50 score columns, the binding filter also offers the ability to filter on the percentile score using the --percentile-threshold parameter. When the --top-score-metric is set to lowest, this threshold is applied to the Best Percentile column. When it is set to median, the threshold is applied to the Median Percentile column.

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Coverage Filter

usage: pvacfuse coverage_filter [-h] [--read-support READ_SUPPORT]
                                [--expn-val EXPN_VAL] [--exclude-NAs]
                                input_file output_file

Filter variants processed by IEDB by read support and expression

positional arguments:
  input_file            The final report .tsv file to filter
  output_file           Output .tsv file containing list of filtered epitopes
                        based on coverage and expression values

optional arguments:
  -h, --help            show this help message and exit
  --read-support READ_SUPPORT
                        Read Support Cutoff. Sites above this cutoff will be
                        considered. (default: 5)
  --expn-val EXPN_VAL   Expression Cutoff. Expression is meassured as FFPM
                        (fusion fragments per million total reads). Sites
                        above this cutoff will be considered. (default: 0.1)
  --exclude-NAs         Exclude NA values from the filtered output. (default:
                        False)

If a pVACfuse process has been run with Arriba data, Read Support information will be available. If AGFusion data was used an input, a STAR-Fusion file will have needed to be provided in the run in order to make Read Support and Expression information available.

The coverage filter can be run again on the filtered.tsv report file to narrow down the results even further. You can also run this filter on the all_epitopes.tsv report file to apply different cutoffs.

The general goals of this filter is to limit variants for neoepitope prediction to those with good read support. In some cases the input data may have already been filtered in this fashion. This filter also allows for removal of variants that do not have sufficient evidence of RNA expression.

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Top Score Filter

usage: pvacfuse top_score_filter [-h] [-m {lowest,median}]
                                 input_file output_file

Pick the best neoepitope for each variant

positional arguments:
  input_file            The final report .tsv file to filter.
  output_file           Output .tsv file containing only the list of the top
                        epitope per variant.

optional arguments:
  -h, --help            show this help message and exit
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use for filtering. lowest:
                        Use the best MT Score (i.e. the lowest MT ic50 binding
                        score of all chosen prediction methods). median: Use
                        the median MT Score (i.e. the median MT ic50 binding
                        score of all chosen prediction methods). (default:
                        median)

This filter picks the top epitope for a variant. Epitopes with the same Chromosome - Start - Stop - Reference - Variant are identified as coming from the same variant.

In order to account for different splice sites among the transcripts of a variant that would lead to different peptides, this filter also takes into account the different transcripts returned by AGFusion/Arriba and will return the top epitope for each transcript if they are non-identical. If the resulting list of top epitopes for the transcripts of a variant is identical, the epitope for the transcript with the highest expression is returned. If this information is not available, the transcript with the lowest Ensembl ID is returned.

By default the --top-score-metric option is set to median which will apply this filter to the Median IC50 Score column and pick the epitope with the lowest median mutant ic50 score for each variant. If the --top-score-metric option is set to lowest, the Best IC50 Score column is instead used to make this determination.