pVACseq logo

Filtering Commands

pVACseq currently offers three filters: a binding filter, a coverage filter, and a top score filter.

The binding filter is always run automatically as part of the pVACseq pipeline. The coverage filter is run automatically if bam-readcount or cufflinks file are proAvided as additional input files to a pVACseq run. The top score filter is run if the --top-result-per-mutation flag is set.

All filters can also be run manually to narrow the final results down further.

Binding Filter

/home/docs/checkouts/readthedocs.org/user_builds/pvactools/conda/latest/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
usage: pvacseq binding_filter [-h] [-b BINDING_THRESHOLD]
                              [-c MINIMUM_FOLD_CHANGE] [-m {lowest,median}]
                              [--exclude-NAs]
                              input_file output_file

positional arguments:
  input_file            The final report .tsv file to filter
  output_file           Output .tsv file containing list of filtered epitopes
                        based on binding affinity

optional arguments:
  -h, --help            show this help message and exit
  -b BINDING_THRESHOLD, --binding-threshold BINDING_THRESHOLD
                        Report only epitopes where the mutant allele has ic50
                        binding scores below this value. Default: 500
  -c MINIMUM_FOLD_CHANGE, --minimum-fold-change MINIMUM_FOLD_CHANGE
                        Minimum fold change between mutant binding score and
                        wild-type score. The default is 0, which filters no
                        results, but 1 is often a sensible option (requiring
                        that binding is better to the MT than WT). Default: 0
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when filtering epitopes
                        by binding-threshold or minimum fold change. lowest:
                        Best MT Score/Corresponding Fold Change - lowest MT
                        ic50 binding score/corresponding fold change of all
                        chosen prediction methods. median: Median MT
                        Score/Median Fold Change - median MT ic50 binding
                        score/fold change of all chosen prediction methods.
                        Default: median
  --exclude-NAs         Exclude NA values from the filtered output. Default:
                        False

The binding filter filters out variants that don’t pass the chosen binding threshold. The user can chose whether to apply this filter to the lowest or the median binding affinity score by setting the --top-score-metric flag. The lowest binding affinity score is recorded in the Best MT Score column and represents the lowest ic50 score of all prediction algorithms that were picked during the previous pVACseq run. The median binding affinity score is recorded in the Median MT Score column and corresponds to the median ic50 score of all prediction algorithms used to create the report. Be default, the binding filter runs on the median binding affinity.

The binding filter also offers the option to filter on Fold Change columns, which contain the ratio of the MT score to the WT Score. This option can be activated by setting the --minimum-fold-change threshold. If the --top-score-metric option is set to lowest, the Corresponding Fold Change column will be used (Corresponding WT Score/Best MT Score). If the --top-score-metric option is set to median, the Median Fold Change column will be used (Median WT Score/Median MT Score).

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Coverage Filter

/home/docs/checkouts/readthedocs.org/user_builds/pvactools/conda/latest/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
usage: pvacseq coverage_filter [-h] [--normal-cov NORMAL_COV]
                               [--tdna-cov TDNA_COV] [--trna-cov TRNA_COV]
                               [--normal-vaf NORMAL_VAF] [--tdna-vaf TDNA_VAF]
                               [--trna-vaf TRNA_VAF] [--expn-val EXPN_VAL]
                               [--exclude-NAs]
                               input_file output_file

positional arguments:
  input_file            The final report .tsv file to filter
  output_file           Output .tsv file containing list of filtered epitopes
                        based on coverage and expression values

optional arguments:
  -h, --help            show this help message and exit
  --normal-cov NORMAL_COV
                        Normal Coverage Cutoff. Sites above this cutoff will
                        be considered. Default: 5
  --tdna-cov TDNA_COV   Tumor DNA Coverage Cutoff. Sites above this cutoff
                        will be considered. Default: 10
  --trna-cov TRNA_COV   Tumor RNA Coverage Cutoff. Sites above this cutoff
                        will be considered. Default: 10
  --normal-vaf NORMAL_VAF
                        Normal VAF Cutoff. Sites BELOW this cutoff in normal
                        will be considered. Default: 2
  --tdna-vaf TDNA_VAF   Tumor DNA VAF Cutoff. Sites above this cutoff will be
                        considered. Default: 40
  --trna-vaf TRNA_VAF   Tumor RNA VAF Cutoff. Sites above this cutoff will be
                        considered. Default: 40
  --expn-val EXPN_VAL   Gene and Transcript Expression cutoff. Sites above
                        this cutoff will be consideredDefault: 1
  --exclude-NAs         Exclude NA values from the filtered output. Default:
                        False

If a pVACseq process has been run with bam-readcount or Cufflinks input files then the coverage filter can be run again on the final report file to narrow down the results even further.

If no additional coverage input files have been provided to the main pVACseq run then this information would need to be manually added to the report in order to run this filter using the appropriate headers. Columns available for this filter are Tumor DNA Depth, Tumor DNA VAF, Tumor RNA Depth, Tumor RNA VAF, Normal Depth, Normal VAF, Gene Expression, Transcript Expression.

By default, entries with NA values will be included in the output. This behavior can be turned off by using the --exclude-NAs flag.

Top Score Filter

/home/docs/checkouts/readthedocs.org/user_builds/pvactools/conda/latest/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
usage: pvacseq top_score_filter [-h] [-m {lowest,median}]
                                input_file output_file

positional arguments:
  input_file            The final report .tsv file to filter
  output_file           Output .tsv file containing only the list of the top
                        epitope per variant

optional arguments:
  -h, --help            show this help message and exit
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use for filtering. lowest:
                        Best MT Score - lowest MT ic50 binding score of all
                        chosen prediction methods. median: Median MT Score -
                        median MT ic50 binding score of all chosen prediction
                        methods. Default: median

This filter picks the top epitope for a variant. By default the --top-score-metric option is set to median which will apply this filter to the Median MT Score column and pick the epitope with the lowest median mutant ic50 score for each variant. If the --top-score-metric option is set to lowest, the Best MT Score column is instead used to make this determination.