Common Errors¶
Input VCF Sample Information¶
VCF contains more than one sample but sample_name is not set.
pVACseq supports running with a multi-sample VCF as input. However, in this case it requires the user to pick the sample to analyze, as only variants that are called in the specified sample will be processed.
When running a multi-sample VCF the sample_name
parameter is used to
identify which sample to analyze. Take, for example, the following #CHROM
VCF header:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
This VCF contains two samples, NORMAL
and TUMOR
. Use TUMOR
as the
sample_name
parameter to process the tumor sample, and NORMAL
to
process the normal sample.
If the input VCF only contains a single sample, the sample_name
parameter
does not need to match the sample name in the VCF.
sample_name not a sample ID in the #CHROM header of VCF
This error occurs when running a multi-sample VCF and the sample_name
parameter doesn’t match any of the sample IDs in the VCF #CHROM
header.
Take, for example, the following #CHROM
header:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
All columns after FORMAT
are sample identifiers that can be used as the
sample_name
parameter when running pVACseq, depending on which sample the
user wishes to process. Change the sample_name
parameter of your pvacseq
run
command to match one of them.
normal_sample_name not a sample ID in the #CHROM header of VCF
Your pvacseq run
command included the --normal-sample-name
parameter.
However, the argument chosen did not match any of the sample identifiers in
the #CHROM
header of the input VCF.
Take, for example, the following #CHROM
VCF header:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
All columns after FORMAT
are sample identifiers that can be used as the
--normal-sample-name
parameter when running pVACseq, depending on which
sample is the normal sample in the VCF. Change the --normal-sample-name
parameter of your pvacseq
run
command to match the appropriate sample identifier.
VCF doesn’t contain any sample genotype information.
pVACseq uses the sample genotype to identified which variants were called.
Therefore, while a VCF without a FORMAT
and sample column(s) is valid, it cannot be used
in pVACseq. You will need to manually edit your VCF and add a FORMAT
and
sample column with the GT
genotype field. For more information on this
formatting please see the VCF specification for your specific VCF version.
Input VCF Compression and Indexing¶
Input VCF needs to be bgzipped when running with a proximal variants VCF.
When running pVACseq with the --proximal-variants-vcf
argument, the main
input VCF needs to be bgzipped and tabix indexed. See the Input File
Preparation section of the documentation for instructions on how to do so.
Proximal variants VCF needs to be bgzipped.
The VCF provided via the --proximal-variants-vcf
argument needs to be
bgzipped and tabix indexed. See the Input File
Preparation section of the documentation for instructions on how to do so.
No .tbi file found for input VCF. Input VCF needs to be tabix indexed if processing with proximal variants.
When running pVACseq with the --proximal-variants-vcf
argument, the main
input VCF needs to be bgzipped and tabix indexed. See the Input File
Preparation section of the documentation for instructions on how to do so.
No .tbi file found for proximal variants VCF. Proximal variants VCF needs to be tabix indexed.
The VCF provided via the --proximal-variants-vcf
argument needs to be
bgzipped and tabix indexed. See the Input File
Preparation section of the documentation for instructions on how to do so.
Input VCF VEP Annotation¶
Input VCF does not contain a CSQ header. Please annotate the VCF with VEP before running it.
pVACseq requires the input VCF to be annotated by VEP. The provided input VCF
doesn’t contain a CSQ
INFO
header. This indicates that it has not been
annotated. The Input File Preparation section of the
documentation provides instructions on how to annotate your VCF with VEP.
VCF doesn’t contain VEP FrameshiftSequence annotations. Please re-annotate the VCF with VEP and the Wildtype and Frameshift plugins.
Although the input VCF was annotated with VEP, it is missing the required annotations provided by the VEP Frameshift plugin. The input VCF will need to be reannotated using all of the required arguments as outlined in the Input File Preparation section of the documentation.
VCF doesn’t contain VEP WildtypeProtein annotations. Please re-annotate the VCF with VEP and the Wildtype and Frameshift plugins.
Although the input VCF was annotated with VEP, it is missing the required annotations provided by the VEP Wildtype plugin. The input VCF will need to be reannotated using all of the required arguments as outlined in the Input File Preparation section of the documentation.
Proximal Variants VCF does not contain a CSQ header. Please annotate the VCF with VEP before running it.
When running pVACseq with the --proximal-variants-vcf
argument, that
proximal variants VCF needs to be annotated by VEP. The provided proximal
variants VCF
doesn’t contain a CSQ
INFO
header. This indicates that it has not been
annotated. The Input File Preparation section of the
documentation provides instructions on how to annotate your VCF with VEP.
There was a mismatch between the actual wildtype amino acid sequence and the expected amino acid sequence. Did you use the same reference build version for VEP that you used for creating the VCF?
This error occurs when the reference nucleotide at a specific position is
different than the Ensembl transcript nucleotide at the same position. This results in
the mutant amino acid in the Amino_acids
VEP annotation being different
from the amino acid of the transcript protein sequence as predicted by the
Wildtype plugin. The Amino_acids
VEP annotation is based on the reference
and alternate nucleotides of the variant while the WildtypeProtein
prediction is based on the Ensembl transcript nucleotide sequence.
This points to a fundamental disagreement between the reference that was used during alignment and variant calling and the Ensembl reference. This mismatch cannot be resolved by pVACseq, which is why this error is fatal.
Here are a few things that might resolve this error:
Checking that the build of the VEP cache matches the alignment build and downloading the correct cache if there is a build mismatch (such as a build 38 cache with a build 37 VCF, or vice versa)
Using the
--assembly
parameter during VEP annotation with the correct build version to match your VCFUsing the
fasta
parameter during VEP annotation with the reference used to create the VCFManually fixing the reference bases in your VCF to match the one expected by Ensembl
Realigning and redoing variant calling on your sample with a reference that matches what is expected by VEP
If this mismatch cannot be resolved the VCF cannot be used by pVACseq. We create the ref-transcript-mismatch-reporter tool to identify and remove such variants from your VCF. The tool is available as part of https://vatools.readthedocs.io/en/latest/ref_transcript_mismatch_reporter.html.
Other¶
The TSV file is empty. Please check that the input VCF contains missense, inframe indel, or frameshift mutations.
None of the variants in the VCF file are supported by pVACseq. This could be
either because none of the variants have a protein-altering consequence or
none of the variants are called in the sample (i.e. have a 0/1
or 1/1
genotype). If you are using the --pass-only
option it might also be the
case that all supported variants are filtered.
Illegal instruction (core dumped)
This issue may occur when you are trying to run the tensorflow-based
prediction algorithms MHCnuggets and/or MHCflurry. This indicates that your
computer’s hardware does not support the version of tensorflow that is
installed. Downgrading tensorflow manually to version 1.5.0 (pip install
tensorflow==1.5.0
) should solve this problem.