
Output Files¶
The pVACseq pipeline will write its results in separate folders depending on which prediction algorithms were chosen:
MHC_Class_I
: for MHC class I prediction algorithmsMHC_Class_II
: for MHC class II prediction algorithmscombined
: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoeptiope predictions from both
Each folder will contain the same list of output files (listed in the order created):
File Name |
Description |
---|---|
|
An intermediate file with variant, transcript, coverage, vaf, and expression information parsed from the input files. |
|
The above file but split into smaller chunks for easier processing with IEDB. |
|
A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the |
|
The above file after applying all filters, with cleavage site and stability predictions added. |
|
A condensed version of the filtered TSV with only the most important columns remaining, with a priority score for each neoepitope candidate added. |
Filters applied to the filtered.tsv file¶
The filtered.tsv file is the all_epitopes file with the following filters applied (in order):
Binding Filter
Coverage Filter
Transcript Support Level Filter
Top Score Filter
Please see the Standalone Filter Commands documentation for more information on each individual filter. The standalone filter commands may be useful to reproduce the filtering or to chose different filtering thresholds.
all_epitopes.tsv and filtered.tsv Report Columns¶
Column Name |
Description |
---|---|
|
The chromosome of this variant |
|
The start position of this variant in the zero-based, half-open coordinate system |
|
The stop position of this variant in the zero-based, half-open coordinate system |
|
The reference allele |
|
The alt allele |
|
The Ensembl ID of the affected transcript |
|
The transcript support level (TSL)
of the affected transcript. |
|
The Ensembl ID of the affected gene |
|
The type of variant. |
|
The amnio acid change of this mutation |
|
The protein position of the mutation |
|
The Ensembl gene name of the affected gene |
|
The HGVS coding sequence variant name |
|
The HGVS protein sequence variant name |
|
The HLA allele for this prediction |
|
The peptide length of the epitope |
|
The one-based position of the epitope within the protein sequence used to make the prediction |
|
The one-based position of the start of the mutation within the epitope sequence. |
|
The mutant epitope sequence |
|
The wildtype (reference) epitope sequence at the same position in the full protein sequence. |
|
Prediction algorithm with the lowest mutant ic50 binding affinity for this epitope |
|
Lowest ic50 binding affinity of all prediction algorithms used |
|
ic50 binding affinity of the wildtype epitope. |
|
|
|
Tumor DNA depth at this position. |
|
Tumor DNA variant allele frequency (VAF) at this position. |
|
Tumor RNA depth at this position. |
|
Tumor RNA variant allele frequency (VAF) at this position. |
|
Normal DNA depth at this position. |
|
Normal DNA variant allele frequency (VAF) at this position. |
|
Gene expression value for the annotated gene containing the variant. |
|
Transcript expression value for the annotated transcript containing the variant. |
|
Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
|
Median ic50 binding affinity of the wildtype epitope across all prediction algorithms used.
|
|
|
|
ic50 scores for the |
|
Mean hydropathy of last 7 residues on the C-terminus of the peptide |
|
Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence. |
|
Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? |
|
Is the C-terminal amino acid a Cysteine? |
|
Is the C-terminal amino acid a Proline? |
|
Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
|
Is the N-terminal amino acid a Asparagine? |
|
Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
|
Position of the highest predicted cleavage score |
|
Highest predicted cleavage score |
|
List of all cleavage positions and their cleavage score |
|
Stability of the pMHC-I complex |
|
Half-life of the pMHC-I complex |
|
The % rank stability of the pMHC-I complex |
|
Nearest neighbor to the |

filtered.condensed.ranked.tsv Report Columns¶
Column Name |
Description |
---|---|
|
The Ensembl gene name of the affected gene. |
|
The amino acid change of this mutation. |
|
The protein position of the mutation. |
|
The HGVS coding sequence name. |
|
The HGVS protein sequence name. |
|
The HLA allele for this prediction. |
|
The one-based position of the start of the mutation within the epitope sequence. |
|
Mutant epitope sequence. |
|
Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
|
Median ic50 binding affinity of the wildtype epitope across all prediction algorithms used.
|
|
|
|
Lowest ic50 binding affinity of all prediction algorithms used |
|
ic50 binding affinity of the wildtype epitope. |
|
|
|
Tumor DNA depth at this position. |
|
Tumor DNA variant allele frequency at this position. |
|
Tumor RNA depth at this position. |
|
Tumor RNA variant allele frequency at this position. |
|
Gene expression value at this position. |
|
A priority rank for the neoepitope (best = 1). |
The pVACseq Neoeptiope Priority Rank¶
Each of the following 4 criteria are assigned a rank-ordered value (worst = 1):
B = Rank of the mutant IC50 binding affinity, with the lowest being the best. If the
--top-score-metric
is set tomedian
(default) theMedian MT Score
is used. If it is set tolowest
theBest MT Score
is used.F = Rank of
Fold Change
between MT and WT alleles, with the highest being the best.M = Rank of mutant allele expression, calculated as (
Gene Expression
*Tumor RNA VAF
), with the highest being the best.D = Rank of
Tumor DNA VAF
, with the highest being the best.
A score is calculated from the above ranks with the following formula: B + F + (M * 2) + (D / 2)
. This score is then converted to a rank (best = 1).
Note
The pVACseq rank calculation detailed above is just one of many ways to prioritize neoeptiope candidates. The body of evidence in this area is still incomplete, and the methodology of ranking is likely to change substantially in future releases. While we have found this ranking useful, it is not a substitute for careful curation and validation efforts.