
Output Files¶
The pVACbind pipeline will write its results in separate folders depending on which prediction algorithms were chosen:
MHC_Class_I
: for MHC class I prediction algorithmsMHC_Class_II
: for MHC class II prediction algorithmscombined
: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoepitope predictions from both
Each folder will contain the same list of output files (listed in the order created):
File Name |
Description |
---|---|
|
A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the |
|
The above file after applying all filters, with cleavage site and stability predictions added. |
|
An aggregated version of the |
|
A file outlining details of reference proteome matches |
Filters applied to the filtered.tsv file¶
The filtered.tsv file is the all_epitopes file with the following filters applied (in order):
Binding Filter
Top Score Filter
Please see the Standalone Filter Commands documentation for more information on each individual filter. The standalone filter commands may be useful to reproduce the filtering or to chose different filtering thresholds.
Prediction Algorithms Supporting Elution Scores¶
MHCflurryEL (Presentation and Processing)
NetMHCpanEL
NetMHCIIpanEL
BigMHC_EL
Prediction Algorithms Supporting Immunogenicity Scores¶
BigMHC_IM
DeepImmuno
Please note that when running pVACbind with only elution or immunogenicity algorithms, no aggregate report and pVACview files are created.
Prediction Algorithms Supporting Percentile Information¶
pVACbind outputs percentile rank information when provided by a chosen binding affinity, elution, or immunogenicity prediction algorithm. The following prediction algorithms calculate a percentile rank:
MHCflurry
MHCflurryEL (Presentation)
MHCnuggets
NetMHC
NetMHCcons
NetMHCpan
NetMHCpanEL
NetMHCIIpan
NetMHCIIpanEL
NNalign
PickPocket
SMM
SMMPMBEC
SMMalign
all_epitopes.tsv and filtered.tsv Report Columns¶
Column Name |
Description |
---|---|
|
The FASTA ID of the peptide sequence the epitope belongs to |
|
The HLA allele for this prediction |
|
The one-based position of the epitope in the protein sequence used to make the prediction |
|
The epitope sequence |
|
Median ic50 binding affinity of the epitope of all prediction algorithms used |
|
Lowest ic50 binding affinity of all prediction algorithms used |
|
Prediction algorithm with the lowest ic50 binding affinity for this epitope |
|
Median binding affinity percentile rank of the epitope of all prediction algorithms used (those that provide percentile output) |
|
Lowest binding affinity percentile rank of all prediction algorithms used (those that provide percentile output) |
|
Prediction algorithm with the lowest binding affinity percentile rank for this epitope |
|
ic50 binding affinity scores and percentiles for the |
|
MHCflurry elution processing score and presentation score and percentiles
for the |
|
Mean hydropathy of last 7 residues on the C-terminus of the peptide |
|
Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence. |
|
Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? |
|
Is the C-terminal amino acid a Cysteine? |
|
Is the C-terminal amino acid a Proline? |
|
Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
|
Is the N-terminal amino acid a Asparagine? |
|
Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
|
Position of the highest predicted cleavage score |
|
Highest predicted cleavage score |
|
List of all cleavage positions and their cleavage score |
|
Stability of the pMHC-I complex |
|
Half-life of the pMHC-I complex |
|
The % rank stability of the pMHC-I complex |
|
Nearest neighbor to the |
all_epitopes.aggregated.tsv Report Columns¶
The all_epitopes.aggregated.tsv
file is an aggregated version of the all_epitopes TSV.
It shows the best-scoring epitope
for each variant, and outputs binding affinity and other information for that epitope. It gives information about the
total number of well-scoring epitopes for each variant as well as the HLA alleles that those
epitopes are well-binding to. Lastly, the report will bin variants into tiers
that offer suggestions as to the suitability of variants for use in vaccines.
Only epitopes meeting the --aggregate-inclusion-binding-threshold
are included in this report (default: 5000).
If the number of unique epitopes for a mutation meeting this threshold exceeds the
--aggregate-inclusion-count-limit
, only the n best-binding epitopes up to this
limit are included (default: 15). If the Best Peptide does not meet the aggregate inclusion criteria, it will be still be
counted in the Num Included Peptides
.
Whether the median or the lowest binding affinity metrics are used for determining the
included eptiopes, selecting the best-scoring epitope, and which values are output in the IC50 MT
and %ile MT
columns is controlled by the --top-score-metric
parameter.
Column Name |
Description |
---|---|
|
A unique identifier for the variant |
|
For each HLA allele in the run, the number of this variant’s epitopes that bound well to the HLA allele (with median binding affinity < 1000) |
|
The best-binding epitope sequence (lowest median binding affinity) |
|
A list of positions in the Best Peptide that are problematic. |
|
The number of included peptides according to the
|
|
The number of included peptides for this mutation that are well-binding. |
|
Median IC50 binding affinity of the best-binding epitope across all prediction algorithms used |
|
Median binding affinity percentile rank of the best-binding epitope across all prediction algorithms used (those that provide percentile output) |
|
Was there a match of the peptide sequence to the reference proteome? |
|
Column to store the evaluation of each variant. Either |
The pVACbind Aggregate Report Tiers¶
Tiering Parameters¶
To tier the Best Peptide, several cutoffs can be adjusted using parameters provided to the pVACfuse run:
Parameter |
Description |
Default |
---|---|---|
|
The threshold used for filtering epitopes on the IC50 MT binding affinity. |
500 |
|
Instead of the hard cutoff set by the |
False |
|
When set, use this threshold to filter epitopes on the %ile MT score in addition to having to meet the binding threshold. |
None |
|
Specify the candidate inclusion strategy. The |
conservative |
Tiers¶
Given the thresholds provided above, the Best Peptide is evaluated and binned into tiers as follows:
Tier |
Criteria |
---|---|
|
Best Peptide passes the binding criteria |
|
Best Peptide fails the binding criteria |
Criteria Details¶
Binding Criteria |
Pass if Best Peptide is strong binder |
|
aggregated.tsv.reference_matches Report Columns¶
This file is only generated when the --run-reference-proteome-similarity
option is chosen.
Column Name |
Description (BLAST) |
Description (reference fasta) |
|
---|---|---|---|
|
A unique identifier for the variant |
||
|
The mutant peptide sequence for the epitope candidate |
||
|
The peptide sequence submitted to BLAST |
The peptide sequence to search for in the reference proteome |
|
|
The BLAST alignment hit ID (reference proteome sequence ID) |
The FASTA header ID of the entry where the match was made |
|
|
The BLAST alignment hit definition (reference proteome sequence name) |
The FASTA header description of the entry where the match was made |
|
|
The substring of the |
||
|
The BLAST match sequence |
The FASTA sequence of the entry where the match was made |
|
|
The match start position of the |
||
|
The match stop position of the |