Output Files¶
The pVACfuse pipeline will write its results in separate folders depending on which prediction algorithms were chosen:
MHC_Class_I
: for MHC class I prediction algorithmsMHC_Class_II
: for MHC class II prediction algorithmscombined
: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoeptiope predictions from both
Each folder will contain the same list of output files (listed in the order created):
File Name | Description |
---|---|
<sample_name>.tsv |
An intermediate file with variant and transcript information parsed from the input file(s). |
<sample_name>.tsv_<chunks> (multiple) |
The above file but split into smaller chunks for easier processing with IEDB. |
<sample_name>.all_epitopes.tsv |
A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the <sample_name>.tsv . |
<sample_name>.filtered.tsv |
The above file after applying all filters, with cleavage site and stability predictions added. |
<sample_name>.filtered.condensed.ranked.tsv |
A condensed version of the filtered TSV with only the most important columns remaining, with a priority score for each neoepitope candidate added. |
all_epitopes.tsv and filtered.tsv Report Columns¶
In order to keep the outputs consistent, pVACfuse uses the same output columns
as pVACseq but some of the values will be NA
if a column doesn’t apply to
pVACfuse.
Column Name | Description |
---|---|
Chromosome |
The chromosome of the 5p and 3p portion of the fusion, separated by ” / “ |
Start |
The start position of the 5p and 3p portion of the fusion, separated by ” / “ |
Stop |
The stop position of the 5p and 3p portion of the fusion, separated by ” / “ |
Reference |
fusion |
Variant |
fusion |
Transcript |
The Ensembl IDs of the affected transcripts |
Transcript Support Level |
NA |
Ensembl Gene ID |
NA |
Variant Type |
The type of fusion. inframe_fusion for inframe fusions, frameshift_fusion for frameshift fusions |
Mutation |
NA |
Protein Position |
The position of the fusion in the fusion protein sequence |
Gene Name |
The Ensembl gene names of the affected genes |
HGVSc |
NA |
HGVSp |
NA |
HLA Allele |
The HLA allele for this prediction |
Peptide Length |
The peptide length of the epitope |
Sub-peptide Position |
The one-based position of the epitope in the protein sequence used to make the prediction |
Mutation Position |
NA |
MT Epitope Seq |
Mutant epitope sequence |
WT Epitope Seq |
NA |
Best MT Score Method |
Prediction algorithm with the lowest mutant ic50 binding affinity for this epitope |
Best MT Score |
Lowest ic50 binding affinity of all prediction algorithms used |
Corresponding WT Score |
NA |
Corresponding Fold Change |
NA |
Tumor DNA Depth |
NA |
Tumor DNA VAF |
NA |
Tumor RNA Depth |
NA |
Tumor RNA VAF |
NA |
Normal Depth |
NA |
Normal VAF |
NA |
Gene Expression |
NA |
Transcript Expression |
NA |
Median MT Score |
Median ic50 binding affinity of the mutant epitope of all prediction algorithms used |
Median WT Score |
NA |
Median Fold Change |
NA |
Individual Prediction Algorithm WT and MT Scores (multiple) |
ic50 scores for the MT Epitope Seq and WT Epitope Seq for the individual prediction algorithms used |
cterm_7mer_gravy_score |
Mean hydropathy of last 7 residues on the C-terminus of the peptide |
max_7mer_gravy_score |
Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence. |
difficult_n_terminal_residue (T/F) |
Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? |
c_terminal_cysteine (T/F) |
Is the C-terminal amino acid a Cysteine? |
c_terminal_proline (T/F) |
Is the C-terminal amino acid a Proline? |
cysteine_count |
Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
n_terminal_asparagine (T/F) |
Is the N-terminal amino acid a Asparagine? |
asparagine_proline_bond_count |
Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
Best Cleavage Position (optional) |
Position of the highest predicted cleavage score |
Best Cleavage Score (optional) |
Highest predicted cleavage score |
Cleavage Sites (optional) |
List of all cleavage positions and their cleavage score |
Predicted Stability (optional) |
Stability of the pMHC-I complex |
Half Life (optional) |
Half-life of the pMHC-I complex |
Stability Rank (optional) |
The % rank stability of the pMHC-I complex |
NetMHCstab allele (optional) |
Nearest neighbor to the HLA Allele . Used for NetMHCstab prediction |
filtered.condensed.ranked.tsv Report Columns¶
Column Name | Description |
---|---|
Gene Name |
The Ensembl gene names of the affected genes |
Mutation |
NA |
Protein Position |
The position of the fusion in the fusion protein sequence |
HGVSc |
NA |
HGVSp |
NA |
HLA Allele |
The HLA allele for this prediction. |
Mutation Position |
NA |
MT Epitope Seq |
Mutant epitope sequence. |
Median MT Score |
Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
Median WT Score |
NA |
Median Fold Change |
NA |
Best MT Score |
Lowest ic50 binding affinity of all prediction algorithms used |
Corresponding WT Score |
NA |
Corresponding Fold Change |
NA |
Tumor DNA Depth |
NA |
Tumor DNA VAF |
NA |
Tumor RNA Depth |
NA |
Tumor RNA VAF |
NA |
Gene Expression |
NA |
Rank |
A priority rank for the neoepitope (best = 1). |
The pVACfuse Neoeptiope Priority Rank¶
The underlying formula for calculating the pVACfuse rank is the same as it is
for The pVACseq Neoeptiope Priority Rank. However, since only the binding affinity is available for
fusion predictions, the pVACfuse simply ranks the neoeptiopes according to
their binding affinity, with the lowest being the best. If the --top-score-metric
is set to median
(default) the Median MT Score
is used. If it
is set to lowest
the Best MT Score
is used.