Output Files¶
The pVACseq pipeline will write its results in separate folders depending on which prediction algorithms were chosen:
MHC_Class_I
: for MHC class I prediction algorithmsMHC_Class_II
: for MHC class II prediction algorithmscombined
: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoepitope predictions from both
Each folder will contain the same list of output files (listed in the order created):
File Name |
Description |
---|---|
|
An intermediate file with variant, transcript, coverage, vaf, and expression information parsed from the input files. |
|
The above file but split into smaller chunks for easier processing with IEDB. |
|
A fasta file with mutant and wildtype peptide subsequences for all processable variant-transcript combinations. |
|
A fasta file with mutant and wildtype peptide subsequences specific for use in running the net_chop tool. |
|
A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the |
|
The above file after applying all filters, with (optionally) cleavage site, stability predictions, and reference proteome similarity metrics added. |
|
An aggregated version of the |
|
A file outlining details of reference proteome matches |
|
A JSON file with detailed information about the predicted epitopes, formatted for pVACview. This file, in combination with the aggregated.tsv file, is required to visualize your results in pVACview. Not generated when running with elution algorithms only. |
|
pVACview R Shiny application files. Not generated when running with elution algorithms only. |
|
Directory containing image files for pVACview. Not generated when running with elution algorithms only. |
Filters applied to the filtered.tsv file¶
The filtered.tsv file is the all_epitopes file with the following filters applied (in order):
Binding Filter
Coverage Filter
Transcript Support Level Filter
Top Score Filter
Please see the Standalone Filter Commands documentation for more information on each individual filter. The standalone filter commands may be useful to reproduce the filtering or to chose different filtering thresholds.
Prediction Algorithms Supporting Percentile Information¶
pVACseq outputs binding affinity percentile rank information when provided by a chosen prediction algorithm. The following prediction algorithms calculate a percentile rank:
MHCflurry
NetMHC
NetMHCcons
NetMHCpan
NetMHCIIpan
NNalign
PickPocket
SMM
SMMPMBEC
SMMalign
The following prediction algorithms do not provide a percentile rank:
MHCnuggets
Prediction Algorithms Supporting Elution Scores¶
MHCflurryEL
NetMHCpanEL
NetMHCIIpanEL
BigMHC_EL
Prediction Algorithms Supporting Immunogenicity Scores¶
BigMHC_IM
DeepImmuno
Please note that when running pVACseq with only elution or immunogenicity algorithms, no aggregate report and pVACview files are created.
all_epitopes.tsv and filtered.tsv Report Columns¶
Column Name |
Description |
---|---|
|
The chromosome of this variant |
|
The start position of this variant in the zero-based, half-open coordinate system |
|
The stop position of this variant in the zero-based, half-open coordinate system |
|
The reference allele |
|
The alt allele |
|
The Ensembl ID of the affected transcript |
|
The transcript support level (TSL)
of the affected transcript. |
|
The protein sequence length of the affected transcript |
|
The biotype of the affected transcript |
|
The Ensembl ID of the affected gene |
|
The type of variant. |
|
The amnio acid change of this mutation |
|
The protein position of the mutation |
|
The Ensembl gene name of the affected gene |
|
The HGVS coding sequence variant name |
|
The HGVS protein sequence variant name |
|
The HLA allele for this prediction |
|
The peptide length of the epitope |
|
The one-based position of the epitope within the protein sequence used to make the prediction |
|
The one-based positional range (inclusive) of the mutation within the epitope sequence. If the mutation is a deletion, the amino acids flanking the deletion are recorded. Note that in the case of ambiguous amino acid changes, this reflects the change that is left-aligned, starting from the first changed amino acid; this may differ from the |
|
The mutant epitope sequence |
|
The wildtype (reference) epitope sequence at the same position in the full protein sequence. |
|
Prediction algorithm with the lowest mutant ic50 binding affinity for this epitope |
|
Lowest ic50 binding affinity of all prediction algorithms used |
|
ic50 binding affinity of the wildtype epitope. |
|
|
|
Prediction algorithm with the lowest binding affinity percentile rank for this epitope |
|
Lowest percentile rank of this epitope’s ic50 binding affinity of all prediction algorithms used (those that provide percentile output) |
|
binding affinity percentile rank of the wildtype epitope. |
|
Tumor DNA depth at this position. |
|
Tumor DNA variant allele frequency (VAF) at this position. |
|
Tumor RNA depth at this position. |
|
Tumor RNA variant allele frequency (VAF) at this position. |
|
Normal DNA depth at this position. |
|
Normal DNA variant allele frequency (VAF) at this position. |
|
Gene expression value for the annotated gene containing the variant. |
|
Transcript expression value for the annotated transcript containing the variant. |
|
Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
|
Median ic50 binding affinity of the wildtype epitope across all prediction algorithms used.
|
|
|
|
Median binding affinity percentile rank of the mutant epitope across all prediction algorithms (those that provide percentile output) |
|
Median binding affinity percentile rank of the wildtype epitope across all prediction algorithms used (those that provide percentile output)
|
|
ic50 binding affintity and percentile ranks for the |
|
MHCflurry elution processing score and presentation score and percentiles
for the |
|
A unique idenitifer for this variant-transcript combination |
|
A list of positions in the |
|
Mean hydropathy of last 7 residues on the C-terminus of the peptide |
|
Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence. |
|
Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? |
|
Is the C-terminal amino acid a Cysteine? |
|
Is the C-terminal amino acid a Proline? |
|
Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
|
Is the N-terminal amino acid a Asparagine? |
|
Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
|
Position of the highest predicted cleavage score |
|
Highest predicted cleavage score |
|
List of all cleavage positions and their cleavage score |
|
Stability of the pMHC-I complex |
|
Half-life of the pMHC-I complex |
|
The % rank stability of the pMHC-I complex |
|
Nearest neighbor to the |
all_epitopes.aggregated.tsv Report Columns¶
The all_epitopes.aggregated.tsv
file is an aggregated version of the all_epitopes TSV.
It shows the best-scoring epitope
for each variant, and outputs additional binding affinity, expression, and
coverage information for that epitope. It also gives information about the
total number of well-scoring epitopes for each variant, the number of
transcripts covered by those epitopes, as well as the HLA alleles that those
epitopes are well-binding to. Lastly, the report will bin variants into tiers
that offer suggestions as to the suitability of variants for use in vaccines.
Only epitopes meeting the --aggregate-inclusion-threshold
are included in this report (default: 5000).
Whether the median or the lowest binding affinity metrics are output in the IC50 MT
,
IC50 WT
, %ile MT
, and %ile WT
columns is controlled by the
--top-score-metric
parameter.
Only epitopes meeting the --aggregate-inclusion-threshold
are included in this report (default: 5000).
Whether the median or the lowest binding affinity metrics are output in the IC50 MT
,
IC50 WT
, %ile MT
, and %ile WT
columns is controlled by the
--top-score-metric
parameter.
Column Name |
Description |
---|---|
|
A unique identifier for the variant |
|
For each HLA allele in the run, the number of this variant’s epitopes that bound well to the HLA allele (with median/lowest mutant binding affinity < binding_threshold) |
|
The Ensembl gene name of the affected gene |
|
The amino acid change for the mutation |
|
The number of transcripts for this mutation that resulted in at least one well-binding peptide (median/lowest mutant binding affinity < 500). |
|
The best-binding mutant epitope sequence (see Best Peptide Criteria below for more details on how this is determined) |
|
The best transcript of all transcripts coding for the Best Peptide (see Best Peptide Criteria below for more details on how this is determined) |
|
The Transcript Support Level of the Best Transcript |
|
The Allele that the Best Peptide is binding to |
|
The one-based position of the start of the mutation within the epitope sequence. |
|
A list of positions in the Best Peptide that are problematic.
|
|
The number of unique well-binding peptides for this mutation. |
|
Median or lowest ic50 binding affinity of the best-binding mutant epitope across all prediction algorithms used |
|
Median or lowest ic50 binding affinity of the corresponding wildtype epitope across all prediction algorithms used. |
|
Median or lowest binding affinity percentile rank of the best-binding mutant epitope across all prediction algorithms used (those that provide percentile output) |
|
Median or lowest binding affinity percentile rank of the corresponding wildtype epitope across all prediction algorithms used (those that provide percentile output) |
|
Gene expression value for the annotated gene containing the variant. |
|
Tumor RNA variant allele frequency (VAF) at this position. |
|
RNA Expr * RNA VAF |
|
Tumor RNA depth at this position. |
|
Tumor DNA variant allele frequency (VAF) at this position. |
|
A tier suggesting the suitability of variants for use in vaccines. |
|
Was there a match of the mutated peptide sequence to the reference proteome? |
|
Column to store the evaluation of each variant when evaluating the run in pVACview. Either |
Best Peptide Criteria¶
To determine the Best Peptide, all peptides meeting the
--aggregate-inclusion-threshold
are evaluated as follows:
Pick all entries with a variant transcript that have a
protein_coding
BiotypeOf the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= maximum_transcript_support_level
Of the remaining entries, pick the entries with no Problematic Positions
Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
Of the remaining entries, pick the one with the lowest median/best MT IC50 score, lowest Transcript Support Level, and longest transcript.
The pVACseq Aggregate Report Tiers¶
Tiering Parameters¶
To tier the Best Peptide, several cutoffs can be adjusted using arguments provided to the pVACseq run:
Parameter |
Description |
Default |
---|---|---|
|
The threshold used for filtering epitopes on the IC50 MT binding affinity. |
500 |
|
Instead of the hard cutoff set by the |
False |
|
When set, use this threshold to filter epitopes on the %ile MT score in addition to having to meet the binding threshold. |
None |
|
Value between 0 and 1 indicating the fraction of tumor cells in the tumor sample. Information is used for a simple estimation of whether variants are subclonal or clonal based on VAF. If not provided, purity is estimated directly from the VAFs. |
None |
|
Tumor RNA VAF Cutoff. Used to calculate the allele expression cutoff for tiering. |
0.25 |
|
Tumor RNA Coverage Cutoff. Used as a cutoff for tiering. |
10 |
|
Gene and Expression cutoff. Used to calculate the allele expression cutoff for tiering. |
1.0 |
|
The threshold to use for filtering epitopes on the Ensembl transcript support level (TSL). Transcript support level needs to be <= this cutoff to be included in most tiers. |
1 |
|
Use allele-specific anchor positions when tiering epitopes in the aggregate report. This option is available for 8, 9, 10, and 11mers and only for HLA-A, B, and C alleles. If this option is not enabled or as a fallback for unsupported lengths and alleles, the default positions of [1, 2, epitope length - 1, and epitope length] are used. Please see https://doi.org/10.1101/2020.12.08.416271 for more details. |
False |
|
For determining allele-specific anchors, each position is assigned a score based on how binding is influenced by mutations. From these scores, the relative contribution of each position to the overall binding is calculated. Starting with the highest relative contribution, positions whose score together account for the selected contribution threshold are assigned as anchor locations. As a result, a higher threshold leads to the inclusion of more positions to be considered anchors. |
0.8 |
Tiers¶
Given the thresholds provided above, the Best Peptide is evaluated and binned into a tier as follows:
Tier |
Citeria |
---|---|
|
Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria |
|
Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria |
|
Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria |
|
Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria |
|
Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) |
|
Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
Criteria Details¶
Binding Criteria |
Pass if Best Peptide is a strong binder |
|
Expression Criteria |
Pass if Best Transcript is expressed |
|
Low Expression Criteria |
Peptide has low expression or no expression but RNA VAF and coverage |
|
TSL Criteria |
Pass if Best Transcript has good transcript support level |
|
Clonal Criteria |
Best Peptide is likely in the founding clone of the tumor |
|
Anchor Criteria |
Fail if all mutated amino acids of the Best Peptide ( |
aggregated.tsv.reference_matches Report Columns¶
This file is only generated when the --run-reference-proteome-similarity
option is chosen.
Column Name |
Description (BLAST) |
Description (reference fasta) |
|
---|---|---|---|
|
The chromosome of this variant |
||
|
The start position of this variant in the zero-based, half-open coordinate system |
||
|
The stop position of this variant in the zero-based, half-open coordinate system |
||
|
The reference allele |
||
|
The alt allele |
||
|
The Ensembl ID of the affected transcript |
||
|
The mutant peptide sequence for the epitope candidate |
||
|
The peptide sequence submitted to BLAST |
The peptide sequence to search for in the reference proteome |
|
|
The BLAST alignment hit ID (reference proteome sequence ID) |
The FASTA header ID of the entry where the match was made |
|
|
The BLAST alignment hit definition (reference proteome sequence name) |
The FASTA header description of the entry where the match was made |
|
|
The substring of the |
||
|
The BLAST match sequence |
The FASTA sequence of the entry where the match was made |
|
|
The match start position of the |
||
|
The match stop position of the |