pVACseq logo

Output Files

The pVACseq pipeline will write its results in separate folders depending on which prediction algorithms were chosen:

  • MHC_Class_I: for MHC class I prediction algorithms

  • MHC_Class_II: for MHC class II prediction algorithms

  • combined: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoepitope predictions from both

Each folder will contain the same list of output files (listed in the order created):

File Name

Description

<sample_name>.tsv

An intermediate file with variant, transcript, coverage, vaf, and expression information parsed from the input files.

<sample_name>.tsv_<chunks> (multiple)

The above file but split into smaller chunks for easier processing with IEDB.

<sample_name>.fasta

A fasta file with mutant and wildtype peptide subsequences for all processable variant-transcript combinations.

<sample_name>.net_chop.fa (optional)

A fasta file with mutant and wildtype peptide subsequences specific for use in running the net_chop tool.

<sample_name>.<MHC_I|MHC_II|Combined>.all_epitopes.tsv

A list of all predicted epitopes and their binding affinity scores, with additional variant information from the <sample_name>.tsv. Only epitopes resulting from supported variants (missense, inframe indels, and frameshifts) are included. If the --pass-only flag is set, variants that have a FILTER set in the VCF are excluded.

<sample_name>.<MHC_I|MHC_II|Combined>.filtered.tsv

The above file after applying all filters, with (optionally) cleavage site, stability predictions, and reference proteome similarity metrics added.

<sample_name>.<MHC_I|MHC_II|Combined>.all_epitopes.aggregated.tsv

An aggregated version of the all_epitopes.tsv file that gives information about the best epitope for each mutation in an easy-to-read format. Not generated when running only with presentation and immunogenicity algorithms.

<sample_name>.<MHC_I|MHC_II|Combined>.all_epitopes.aggregated.tsv.reference_matches (optional)

A file outlining details of reference proteome matches

<sample_name>.<MHC_I|MHC_II|Combined>.all_epitopes.aggregated.metrics.json

A JSON file with detailed information about the predicted epitopes, formatted for pVACview. This file, in combination with the aggregated.tsv file, is required to visualize your results in pVACview.

Various R files

pVACview R Shiny application files.

www (directory)

Directory containing image files for pVACview.

<sample_name>.MHC_I.all_epitopes.aggregated.ML_predict.tsv (optional)

A version of the <sample_name>.MHC_I.all_epitopes.aggregated.tsv with ML-based neoantigen evaluation predictions. Generated when both MHC Class I and Class II predictions are run and the --run-ml-predictions flag is set. Written only to the MHC_Class_I folder.

Filters applied to the filtered.tsv file

The filtered.tsv file is the all_epitopes file with the following filters applied (in order):

  • Binding Filter

  • Coverage Filter

  • Transcript Filter

  • Top Score Filter

Please see the Standalone Filter Commands documentation for more information on each individual filter. The standalone filter commands may be useful to reproduce the filtering or to chose different filtering thresholds.

all_epitopes.tsv and filtered.tsv Report Columns

Column Name

Description

Chromosome

The chromosome of this variant

Start

The start position of this variant in the zero-based, half-open coordinate system

Stop

The stop position of this variant in the zero-based, half-open coordinate system

Reference

The reference allele

Variant

The alt allele

Transcript

The Ensembl ID of the affected transcript

Transcript Support Level

The transcript support level (TSL) of the affected transcript. Not Supported if the VCF entry doesn’t contain TSL information.

Transcript Length

The protein sequence length of the affected transcript

MANE Select (True/False/Not Run)

Whether or not the Best Transcript is the MANE Select transcript. Not Run if VCF was VEP-annotated without the --mane_select flag.

Canonical (True/False/Not Run)

Whether or not the Best Transcript is the Canonical transcript. Not Run if VCF was VEP-annotated without the --canonical flag.

Biotype

The biotype of the affected transcript

Transcript CDS Flags

A list of CDS flags set on the transcript by VEP. None if there are none.

Ensembl Gene ID

The Ensembl ID of the affected gene

Variant Type

The type of variant. missense for missense mutations, inframe_ins for inframe insertions, inframe_del for inframe deletions, and FS for frameshift variants

Mutation

The amnio acid change of this mutation

Protein Position

The protein position of the mutation

Gene Name

The Ensembl gene name of the affected gene

HGVSc

The HGVS coding sequence variant name

HGVSp

The HGVS protein sequence variant name

HLA Allele

The HLA allele for this prediction

Peptide Length

The peptide length of the epitope

Sub-peptide Position

The one-based position of the epitope within the protein sequence used to make the prediction

Mutation Position

A comma-separated list of all amino acid positions in the MT Epitope Seq that are different from the WT Epitope Seq. NA if the WT Epitope Seq is NA.

MT Epitope Seq

The mutant epitope sequence

WT Epitope Seq

The wildtype (reference) epitope sequence at the same position in the full protein sequence. NA if there is no wildtype sequence at this position or if more than half of the amino acids of the mutant epitope are mutated

Best MT IC50 Score Method

Prediction algorithm with the lowest mutant IC50 binding affinity for this epitope

Best MT IC50 Score

Lowest IC50 binding affinity of all prediction algorithms used

Corresponding WT IC50 Score

IC50 binding affinity of the wildtype epitope. NA if there is no WT Epitope Seq.

Corresponding Fold Change

Corresponding WT IC50 Score / Best MT IC50 Score. NA if there is no WT Epitope Seq.

Best MT Percentile Method

Prediction algorithm with the lowest percentile rank for this epitope

Best MT Percentile

Lowest percentile rank of all prediction algorithms used (those that provide percentile output)

Corresponding WT Percentile

Percentile rank of the wildtype epitope using the Best MT Percentile Method. NA if there is no WT Epitope Seq.

Best MT IC50 Percentile Method

Binding prediction algorithm with the lowest binding percentile rank for this epitope

Best MT IC50 Percentile

Lowest binding percentile rank of all binding prediction algorithms used (those that provide percentile output)

Corresponding WT IC50 Percentile

Binding percentile rank of the wildtype epitope using the Best MT IC50 Percentile Method. NA if there is no WT Epitope Seq.

Best MT Immunogenicity Percentile Method

Immunogenicity prediction algorithm with the lowest immunogenicity percentile rank for this epitope

Best MT Immunogenicity Percentile

Lowest immunogenicity percentile rank of all immunogenicity prediction algorithms used (those that provide percentile output)

Corresponding WT Immunogenicity Percentile

Immunogenicity percentile rank of the wildtype epitope using the Best MT Immunogenicity Percentile Method. NA if there is no WT Epitope Seq.

Best MT Presentation Percentile Method

Presentation prediction algorithm with the lowest presentation percentile rank for this epitope

Best MT Presentation Percentile

Lowest presentation percentile rank of all presentation prediction algorithms used (those that provide percentile output)

Corresponding WT Presentation Percentile

Presentation percentile rank of the wildtype epitope using the Best MT Presentation Percentile Method. NA if there is no WT Epitope Seq.

Tumor DNA Depth

Tumor DNA depth at this position. NA if VCF entry does not contain tumor DNA readcount annotation.

Tumor DNA VAF

Tumor DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor DNA readcount annotation.

Tumor RNA Depth

Tumor RNA depth at this position. NA if VCF entry does not contain tumor RNA readcount annotation.

Tumor RNA VAF

Tumor RNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor RNA readcount annotation.

Normal Depth

Normal DNA depth at this position. NA if VCF entry does not contain normal DNA readcount annotation.

Normal VAF

Normal DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain normal DNA readcount annotation.

Gene Expression

Gene expression value for the annotated gene containing the variant. NA if VCF entry does not contain gene expression annotation.

Transcript Expression

Transcript expression value for the annotated transcript containing the variant. NA if VCF entry does not contain transcript expression annotation.

Median MT IC50 Score

Median IC50 binding affinity of the mutant epitope across all binding prediction algorithms used

Median WT IC50 Score

Median IC50 binding affinity of the wildtype epitope across all binding prediction algorithms used. NA if there is no WT Epitope Seq.

Median Fold Change

Median WT IC50 Score / Median MT IC50 Score. NA if there is no WT Epitope Seq.

Median MT Percentile

Median percentile rank of the mutant epitope across all prediction algorithms (those that provide percentile output)

Median WT Percentile

Median percentile rank of the wildtype epitope across all prediction algorithms used (those that provide percentile output) NA if there is no WT Epitope Seq.

Median MT IC50 Percentile

Median binding percentile rank of the mutant epitope across all binding prediction algorithms (those that provide percentile output)

Median WT IC50 Percentile

Median binding percentile rank of the wildtype epitope across all binding prediction algorithms used (those that provide percentile output) NA if there is no WT Epitope Seq.

Median MT Immunogenicity Percentile

Median immunogenicity percentile rank of the mutant epitope across all immunogenicity prediction algorithms (those that provide percentile output)

Median WT Immunogenicity Percentile

Median immunogenicity percentile rank of the wildtype epitope across all immunogenicity prediction algorithms used (those that provide percentile output) NA if there is no WT Epitope Seq.

Median MT Presentation Percentile

Median presentation percentile rank of the mutant epitope across all presentation prediction algorithms (those that provide percentile output)

Median WT Presentation Percentile

Median presentation percentile rank of the wildtype epitope across all presentation prediction algorithms used (those that provide percentile output) NA if there is no WT Epitope Seq.

Individual Prediction Algorithm WT and MT Scores and Percentiles (multiple)

ic50 binding affinity scores, binding scores, presentation scores, processing scores, or immunogenicity scores as well as percentile ranks for the MT Epitope Seq and WT Eptiope Seq for the individual prediction algorithms used. Percentile scores may be NA if not provided by the prediction algorithm.

Index

A unique idenitifer for this variant-transcript combination

Problematic Positions (optional)

A list of positions in the MT Epitope Seq that match the problematic amino acids defined by the --problematic-amino-acids parameter

Gene of Interest (T/F)

Is the Gene Name found in the genes of interest list?

cterm_7mer_gravy_score

Mean hydropathy of last 7 residues on the C-terminus of the peptide

max_7mer_gravy_score

Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence.

difficult_n_terminal_residue (T/F)

Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine?

c_terminal_cysteine (T/F)

Is the C-terminal amino acid a Cysteine?

c_terminal_proline (T/F)

Is the C-terminal amino acid a Proline?

cysteine_count

Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide

n_terminal_asparagine (T/F)

Is the N-terminal amino acid a Asparagine?

asparagine_proline_bond_count

Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide

Best Cleavage Position (optional)

Position of the highest predicted cleavage score

Best Cleavage Score (optional)

Highest predicted cleavage score

Cleavage Sites (optional)

List of all cleavage positions and their cleavage score

Predicted Stability (optional)

Stability of the pMHC-I complex

Half Life (optional)

Half-life of the pMHC-I complex

Stability Rank (optional)

The % rank stability of the pMHC-I complex

NetMHCstab allele (optional)

Nearest neighbor to the HLA Allele. Used for NetMHCstab prediction

pVACseq output file columns illustration

all_epitopes.aggregated.tsv Report Columns

The all_epitopes.aggregated.tsv file is an aggregated version of the all_epitopes TSV. It shows the best-scoring epitope for each variant, and outputs additional binding affinity, expression, and coverage information for that epitope. It also gives information about the total number of well-scoring epitopes for each variant, the number of transcripts covered by those epitopes, as well as the HLA alleles that those epitopes are well-binding to. Lastly, the report will bin variants into tiers that offer suggestions as to the suitability of variants for use in vaccines.

Additionally, a metrics.json file gets created, containing metadata about the Best Peptide as well as alternate neoantigen canddiates for each variant. This file can be loaded into pVACview in conjunction with the aggregated report in order to visualize the candidates. In order to limit the size of the metrics.json file, only a limited number of neoantigen candidates are included in this file. Only neoantigen candidates meeting the --aggregate-inclusion-binding-threshold are included in this file (default: 5000). If the number of unique epitopes for a mutation meeting this threshold exceeds the --aggregate-inclusion-count-limit, only the top n epitopes up to this limit are included (default: 15). The method for selecting the top n epitopes is analogous to the one used to determine the best-scoring epitope. For each epitope of a mutation, all result entries (i.e. for different HLA alleles and transcripts) meeting the --aggregate-inclusion-binding-threshold are considered and the best entry is selected. The selection of best entry for each epitope are then sorted by the transcript biotype, the transcript support level, whether or not the anchor criteria was passed, the MT IC50 score, the transcript length, and the MT percentile. From this sorted list the top n entries are selected up to the --aggregate-inclusion-count-limit.

If the Best Peptide does not meet the aggregate inclusion criteria, it will be still be included in the metrics.json file and counted in the Num Included Peptides.

Whether the median or the lowest binding affinity metrics are used for determining the included epitopes, selecting the best-scoring epitope, and which values are output in the IC50 and %ile columns is controlled by the --top-score-metric parameter.

Column Name

Description

ID

A unique identifier for the variant

Index

A unique identifier for the variant and Best Transcript

HLA Alleles (multiple)

For each HLA allele in the run, the number of this variant’s epitopes that bound well to the HLA allele (with median/lowest mutant binding affinity < binding_threshold)

Gene

The Ensembl gene name of the affected gene

AA Change

The amino acid change for the mutation

Num Passing Transcripts

The number of transcripts for this mutation that resulted in at least one well-binding peptide (median/lowest mutant binding affinity < 500).

Best Peptide

The best mutant epitope sequence (see Best Peptide Criteria below for more details on how this is determined)

Best Transcript

The best transcript of all transcripts coding for the Best Peptide (see Best Peptide Criteria below for more details on how this is determined)

MANE Select (True/False/Not Run)

Whether or not the Best Transcript is the MANE Select transcript. Not Run if VCF was VEP-annotated without the --mane_select flag.

Canonical (True/False/Not Run)

Whether or not the Best Transcript is the Canonical transcript. Not Run if VCF was VEP-annotated without the --canonical flag.

TSL

The Transcript Support Level of the Best Transcript. Not Supported reference is GRCh37 or older.

Allele

The Allele that the Best Peptide is binding to

Pos

A comma-separated list of all amino acid positions in the MT Epitope Seq that are different from the WT Epitope Seq. NA if the WT Epitope Seq is NA.

Prob Pos

A list of positions in the Best Peptide that are problematic. None if none of the Best Peptide amino acids are problematic or if the --problematic-pos parameter was not set during the pVACseq run.

Num Included Peptides

The number of included peptides according to the --aggregate-inclusion-binding-threshold and --aggregate-inclusion-count-limit

Num Passing Peptides

The number of included peptides for this mutation that are well-binding.

IC50 MT

Median or lowest ic50 binding affinity of the Best Peptide across all prediction algorithms used

IC50 WT

Median or lowest ic50 binding affinity of the corresponding wildtype epitope across all prediction algorithms used.

%ile MT

Median or lowest percentile rank of the Best Peptide across all prediction algorithms used

%ile WT

Median or lowest percentile rank of the corresponding wildtype epitope across all prediction algorithms used

IC50 %ile MT

Median or lowest binding percentile rank of the Best Peptide across all binding prediction algorithms used

IC50 %ile WT

Median or lowest binding percentile rank of the corresponding wildtype epitope across all binding prediction algorithms used

IM %ile MT

Median or lowest immunogenicity percentile rank of the Best Peptide across all immunogenicity prediction algorithms used

IM %ile WT

Median or lowest immunogenicity percentile rank of the corresponding wildtype epitope across all immunogenicity prediction algorithms used

Pres %ile MT

Median or lowest presentation percentile rank of the Best Peptide across all presentation prediction algorithms used

Pres %ile WT

Median or lowest presentation percentile rank of the corresponding wildtype epitope across all presentation prediction algorithms used

RNA Expr

Gene expression value for the annotated gene containing the variant.

RNA VAF

Tumor RNA variant allele frequency (VAF) at this position.

Allele Expr

RNA Expr * RNA VAF

RNA Depth

Tumor RNA depth at this position.

DNA VAF

Tumor DNA variant allele frequency (VAF) at this position.

Tier

A tier suggesting the suitability of variants for use in vaccines.

Ref Match (True/False/Not Run)

Whether or not there a match of the mutated peptide sequence to the reference proteome. Not Run if --run-reference-proteome-simlarity flag was not set during the pVACseq run.

Evaluation

Column to store the evaluation of each variant when evaluating the run in pVACview. Either Accept, Reject, or Review.

<sample_name>.MHC_I.all_epitopes.aggregated.ML_predict.tsv Report Columns

The <sample_name>.MHC_I.all_epitopes.aggregated.ML_predict.tsv file is generated when using the add_ml_predictions tool or when running pVACseq with both MHC Class I and Class II predictions and the --run-ml-predictions flag enabled. This file contains all columns from the Class I aggregated file (all_epitopes.aggregated.tsv) with one additional ML prediction column added.

The file is written to the same folder as the Class I aggregated file (MHC_Class_I within the output directory).

Column Name

Description

All columns from <sample_name>.MHC_I.all_epitopes.aggregated.tsv

All columns described in the all_epitopes.aggregated.tsv Report Columns section above are included in this file.

Evaluation

Populated with ML-predicted evaluation status for each candidate. Values: Accept for variants with prediction probability >= ml-threshold-accept (default: 0.55), Reject for variants with prediction probability <= ml-threshold-reject (default: 0.30), and Pending for variants with prediction probability between ml-threshold-reject and ml-threshold-accept or when the ML model cannot make a prediction due to missing data.

ML Prediction (score)

ML-based prediction evaluation with probability score. Format: "<Evaluation> (<probability_score>)" (e.g., "Accept (0.72)", "Reject (0.15)", "Review (0.48)"). Shows "NA" when the ML model cannot make a prediction due to missing data (e.g., when Class I and Class II aggregated files have different numbers of rows).

Best Peptide Criteria

To determine the Best Peptide, all peptides meeting the --aggregate-inclusion-threshold and --aggregate-inclusion-count-limit (see above) for a variant are evaluated as follows:

  • If --allow-inclomplete-transcripts flag is set, pick the entries without a Transcript CDS Flags set.

  • Of the remaining entries, pick the entries where the Biotype is protein_coding.

  • Of the remaining entries, pick the entries that pass at least one of the transcript criteria selected in the --transcript-prioritization-strategy taking into consideration the --maximum-transcript-support-level if tsl is one of the selected criteria.

  • Of the remaining entries, pick the entries with no Problematic Positions.

  • Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)

  • For the remaining entries, calculate a rank for all the metrics specified via the --top-score-metric2 parameter and sum them. Whether the lowest or median value is considered for each metric is controlled by the --top-score-metric parameter. Sort the remaining entries on this sum rank followed by the rank of the first --top-score-metric2 specified (to break any ties in the sum rank), MANE Select (True), Canonical (True), Transcript Support Level, Transcript Length, and Transcript Expression. Select the highest sorted entry.

The pVACseq Aggregate Report Tiers

Tiering Parameters

To tier the Best Peptide, several cutoffs can be adjusted using arguments provided to the pVACseq run:

Parameter

Description

Default

--binding-threshold

The threshold used for filtering epitopes on the IC50 MT binding affinity.

500

--allele-specific-binding-thresholds

Instead of the hard cutoff set by the --binding-threshold, use allele-specific binding thresholds. For alleles where no allele-specific binding threshold is available, use the --binding-threshold as a fallback. To print a list of alleles that have specific binding thresholds and the value of those thresholds, run pvacseq allele_specific_cutoffs.

False

--binding-percentile-threshold

Use this threshold to filter epitopes on the IC50 %ile MT score.

2.0

--presentation-percentile-threshold

Use this threshold to filter epitopes on the Pres %ile MT score.

2.0

--immunogenicity-percentile-threshold

Use this threshold to filter epitopes on the IM %ile MT score.

2.0

--percentile-threshold-strategy

Specify the candidate inclusion strategy. The conservative option requires a candidate to pass the binding threshold, the binding percentile threshold, the presentation percentile threshold, AND the immunogenicity percentile threshold. The exploratory option requires a candidate to pass EITHER the binding threshold, the binding percentile threshold, the presentation percentile threshold, OR the immunogenicity percentile threshold.

conservative

--tumor-purity

Value between 0 and 1 indicating the fraction of tumor cells in the tumor sample. Information is used for a simple estimation of whether variants are subclonal or clonal based on VAF. If not provided, purity is estimated directly from the VAFs.

None

--trna-vaf

Tumor RNA VAF Cutoff. Used to calculate the allele expression cutoff for tiering.

0.25

--trna-cov

Tumor RNA Coverage Cutoff. Used as a cutoff for tiering.

10

--expn-val

Gene and Expression cutoff. Used to calculate the allele expression cutoff for tiering.

1.0

--transcript-prioritization-strategy

Which transcript-specific criteria to consider to pass a transcript.

[‘mane_select’, ‘canonical’, ‘tsl’]

--maximum-transcript-support-level

The threshold to evaluate an epitope’s best transcript on the Ensembl transcript support level (TSL). Transcript support level needs to be <= this cutoff to be included most tiers when tsl is included as transcript prioritization strategy.

1

--allele-specific-anchors

Use allele-specific anchor positions when tiering epitopes in the aggregate report. This option is available for 8, 9, 10, and 11mers and only for HLA-A, B, and C alleles. If this option is not enabled or as a fallback for unsupported lengths and alleles, the default positions of [1, 2, epitope length - 1, and epitope length] are used. Please see https://doi.org/10.1101/2020.12.08.416271 for more details.

False

--anchor-contribution-threshold

For determining allele-specific anchors, each position is assigned a score based on how binding is influenced by mutations. From these scores, the relative contribution of each position to the overall binding is calculated. Starting with the highest relative contribution, positions whose score together account for the selected contribution threshold are assigned as anchor locations. As a result, a higher threshold leads to the inclusion of more positions to be considered anchors.

0.8

--run-reference-proteome-similarity

Set this flag in order to run reference proteome similarity analysis and enable RefMatch tiering. Use --blastp-path, --blastp-db, and --peptide-fasta parameters to configure your run.

False

--problematic-amino-acids

Configure this parameter in order to define amino acids problematic for the desired therapy delivery platform and enable ProbPos tiering.

None

Tiers

Given the thresholds provided above, the Best Peptide is evaluated and binned into a tier as follows:

Tier

Criteria

Pass

Best Peptide passes the scores, reference match, expression, transcript, clonal, problematic position, and anchor criteria

PoorBinder

Best Peptide fails the binding criteria but passed the presentation, immunogenicity, reference match, expression, transcript, clonal, problematic position, and anchor criteria

PoorPresentation

Best Peptide fails the presentation criteria but passed the binding, immunogenicity, reference match, expression, transcript, clonal, problematic position, and anchor criteria

PoorImmunogenicity

Best Peptide fails the immunogenicity criteria but passed the binding, presentation, reference match, expression, transcript, clonal, problematic position, and anchor criteria

RefMatch

Best Peptide fails the reference match criteria but passes the scores, expression, transcript, clonal, problematic position, and anchor criteria

PoorTranscript

Best Peptide fails the transcript criteria but passes the scores, reference match, expression, clonal, problematic position, and anchor criteria

LowExpr

Best Peptide meets the low expression criteria and passes the scores, reference match, transcript, clonal, problematic position, and anchor criteria

Anchor

Best Peptide fails the anchor criteria but passes the scores, reference match, expression, transcript, clonal, and problematic position criteria

Subclonal

Best Peptide fails the clonal criteria but passes the scores, reference match, expression, transcript, problematic position, and anchor criteria

ProbPos

Best Peptide fails the problematic position criteria but passes the scores, reference match, expression, transcript, clonal, and anchor criteria

Poor

Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria

NoExpr

Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0)

Criteria Details

Criteria

Description

Evaluation Logic

Binding Criteria

Pass if Best Peptide is strong binder

binding score criteria: IC50 MT < binding_threshold

binding percentile score criteria: IC50 %ile MT < binding_percentile_threshold

conservative --percentile-threshold-strategy: needs to pass BOTH the binding score criteria AND the binding percentile score criteria

exploratory --percentile-threshold-strategy: needs to pass EITHER the binding score criteria OR the binding percentile score criteria

Presentation Criteria

Pass if the Best Peptide is presented by the MHC

Pres %ile MT < presentation_percentile_threshold

Immunogenicity Criteria

Pass if the Best Peptide is immunogenic

IM %ile MT < immunogenicity_percentile_threshold

Scores Criteria

Pass if the Best Peptide is a strong binder, presented by the MHC, and/or immunogenic

conservative --percentile-threshold-strategy: needs to pass the binding criteria, the presentation criteria, AND the immunogenicity criteria

exploratory --percentile-threshold-strategy: needs to pass the binding criteria, the presentation criteria, OR the immunogenicity criteria

Expression Criteria

Pass if Best Transcript is expressed

Allele Expr > trna_vaf * expn_val

Reference Match Criteria

Pass if there are no reference proteome matches

Ref Match == False

Transcript Criteria

Pass if Best Transcript matches any of the user-specified --transcript-prioritization-strategy criteria

TSL <= maximum_transcript_support_level (if --transcript-prioritization-strategy includes tsl)

MANE Select == True (if --transcript-prioritization-strategy includes ``mane_select)

Canonical == True (if --transcript-prioritization-strategy incluces canonical)

Low Expression Criteria

Peptide has low expression or no expression but RNA VAF and coverage

(0 < Allele Expr < trna_vaf * expn_val) OR (RNA Expr == 0 AND RNA Depth > trna_cov AND RNA VAF > trna_vaf)

Anchor Criteria

Fail if if there are <= 2 mutated amino acids and all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < binding_threshold)

Clonal Criteria

Best Peptide is likely in the founding clone of the tumor

DNA VAF > tumor_purity / 4

Problematic Position Criteria

Best Peptide does not contain a problematic amino acid as defined by the --problematic-amino-acids parameters

Prob Pos == None

The pVACseq Aggregate Report Sorting

The aggregate report is sorted as follows:

Sort Criteria

Sort Order

Tier column

“Pass”, “PoorBinder”, “PoorImmunogenicity”, “PoorPresentation”, “RefMatch”, “PoorTranscript”, “LowExpr”, “Anchor”, “Subclonal”, “ProbPos”, “Poor”, “NoExpr”

Sum of ascending ranks of Allele Expr and the ascending ranks of the metrics selected via the --top-score-metric2 parameter (possible values: IC50 MT, %ile MT, IC50 %ile MT, Pres %ile MT; default: IC50 MT, %ile MT).

Ascending sum rank

First metric specified in the --top-score-metric2 as a tie breaker for identical sum ranks

Ascending rank

Gene column

Alphabetical

AA Change column

Alphabetical

aggregated.tsv.reference_matches Report Columns

This file is only generated when the --run-reference-proteome-similarity option is chosen.

Column Name

Description (BLAST)

Description (reference fasta)

Chromosome

The chromosome of this variant

Start

The start position of this variant in the zero-based, half-open coordinate system

Stop

The stop position of this variant in the zero-based, half-open coordinate system

Reference

The reference allele

Variant

The alt allele

Transcript

The Ensembl ID of the affected transcript

MT Epitope Seq

The mutant peptide sequence for the epitope candidate

Peptide

The peptide sequence submitted to BLAST

The peptide sequence to search for in the reference proteome

Hit ID

The BLAST alignment hit ID (reference proteome sequence ID)

The FASTA header ID of the entry where the match was made

Hit Definition

The BLAST alignment hit definition (reference proteome sequence name)

The FASTA header description of the entry where the match was made

Match Window

The substring of the Peptide that was found in the Match Sequence

Match Sequence

The BLAST match sequence

The FASTA sequence of the entry where the match was made

Match Start

The match start position of the Match Window in the Match Sequence

Match Stop

The match stop position of the Match Window in the Match Sequence