Common Errors

Input VCF Sample Information

VCF contains more than one sample but sample_name is not set.

pVACseq supports running with a multi-sample VCF as input. However, in this case it requires the user to pick the sample to analyze, as only variants that are called in the specified sample will be processed.

When running a multi-sample VCF the sample_name parameter is used to identify which sample to analyze. Take, for example, the following #CHROM VCF header:

#CHROM       POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

This VCF contains two samples, NORMAL and TUMOR. Use TUMOR as the sample_name parameter to process the tumor sample, and NORMAL to process the normal sample.

If the input VCF only contains a single sample, the sample_name parameter does not need to match the sample name in the VCF.

sample_name not a sample ID in the #CHROM header of VCF

This error occurs when running a multi-sample VCF and the sample_name parameter doesn’t match any of the sample IDs in the VCF #CHROM header. Take, for example, the following #CHROM header:

#CHROM       POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

All columns after FORMAT are sample identifiers that can be used as the sample_name parameter when running pVACseq, depending on which sample the user wishes to process. Change the sample_name parameter of your pvacseq run command to match one of them.

normal_sample_name not a sample ID in the #CHROM header of VCF

Your pvacseq run command included the --normal-sample-name parameter. However, the argument chosen did not match any of the sample identifiers in the #CHROM header of the input VCF.

Take, for example, the following #CHROM VCF header:

#CHROM       POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

All columns after FORMAT are sample identifiers that can be used as the --normal-sample-name parameter when running pVACseq, depending on which sample is the normal sample in the VCF. Change the --normal-sample-name parameter of your pvacseq run command to match the appropriate sample identifier.

VCF doesn’t contain any sample genotype information.

pVACseq uses the sample genotype to identified which variants were called. Therefore, while a VCF without a FORMAT and sample column(s) is valid, it cannot be used in pVACseq. You will need to manually edit your VCF and add a FORMAT and sample column with the GT genotype field. For more information on this formatting please see the VCF specification for your specific VCF version.

Input VCF Compression and Indexing

Input VCF needs to be bgzipped when running with a proximal variants VCF.

When running pVACseq with the --proximal-variants-vcf argument, the main input VCF needs to be bgzipped and tabix indexed. See the Input File Preparation section of the documentation for instructions on how to do so.

Proximal variants VCF needs to be bgzipped.

The VCF provided via the --proximal-variants-vcf argument needs to be bgzipped and tabix indexed. See the Input File Preparation section of the documentation for instructions on how to do so.

No .tbi file found for input VCF. Input VCF needs to be tabix indexed if processing with proximal variants.

When running pVACseq with the --proximal-variants-vcf argument, the main input VCF needs to be bgzipped and tabix indexed. See the Input File Preparation section of the documentation for instructions on how to do so.

No .tbi file found for proximal variants VCF. Proximal variants VCF needs to be tabix indexed.

The VCF provided via the --proximal-variants-vcf argument needs to be bgzipped and tabix indexed. See the Input File Preparation section of the documentation for instructions on how to do so.

Input VCF VEP Annotation

Input VCF does not contain a CSQ header. Please annotate the VCF with VEP before running it.

pVACseq requires the input VCF to be annotated by VEP. The provided input VCF doesn’t contain a CSQ INFO header. This indicates that it has not been annotated. The Input File Preparation section of the documentation provides instructions on how to annotate your VCF with VEP.

VCF doesn’t contain VEP FrameshiftSequence annotations. Please re-annotate the VCF with VEP and the Wildtype and Frameshift plugins.

Although the input VCF was annotated with VEP, it is missing the required annotations provided by the VEP Frameshift plugin. The input VCF will need to be reannotated using all of the required arguments as outlined in the Input File Preparation section of the documentation.

VCF doesn’t contain VEP WildtypeProtein annotations. Please re-annotate the VCF with VEP and the Wildtype and Frameshift plugins.

Although the input VCF was annotated with VEP, it is missing the required annotations provided by the VEP Wildtype plugin. The input VCF will need to be reannotated using all of the required arguments as outlined in the Input File Preparation section of the documentation.

Proximal Variants VCF does not contain a CSQ header. Please annotate the VCF with VEP before running it.

When running pVACseq with the --proximal-variants-vcf argument, that proximal variants VCF needs to be annotated by VEP. The provided proximal variants VCF doesn’t contain a CSQ INFO header. This indicates that it has not been annotated. The Input File Preparation section of the documentation provides instructions on how to annotate your VCF with VEP.

There was a mismatch between the actual wildtype amino acid sequence and the expected amino acid sequence. Did you use the same reference build version for VEP that you used for creating the VCF?

This error occurs when the reference nucleotide at a specific position is different than the Ensembl transcript nucleotide at the same position. This results in the mutant amino acid in the Amino_acids VEP annotation being different from the amino acid of the transcript protein sequence as predicted by the Wildtype plugin. The Amino_acids VEP annotation is based on the reference and alternate nucleotides of the variant while the WildtypeProtein prediction is based on the Ensembl transcript nucleotide sequence.

This points to a fundamental disagreement between the reference that was used during alignment and variant calling and the Ensembl reference. This mismatch cannot be resolved by pVACseq, which is why this error is fatal.

Here are a few things that might resolve this error:

  • Checking that the build of the VEP cache matches the alignment build and downloading the correct cache if there is a build mismatch (such as a build 38 cache with a build 37 VCF, or vice versa)

  • Using the --assembly parameter during VEP annotation with the correct build version to match your VCF

  • Using the fasta parameter during VEP annotation with the reference used to create the VCF

  • Manually fixing the reference bases in your VCF to match the one expected by Ensembl

  • Realigning and redoing variant calling on your sample with a reference that matches what is expected by VEP

If this mismatch cannot be resolved the VCF cannot be used by pVACseq. We create the ref-transcript-mismatch-reporter tool to identify and remove such variants from your VCF. The tool is available as part of https://vatools.readthedocs.io/en/latest/ref_transcript_mismatch_reporter.html.

Other

The TSV file is empty. Please check that the input VCF contains missense, inframe indel, or frameshift mutations.

None of the variants in the VCF file are supported by pVACseq. This could be either because none of the variants have a protein-altering consequence or none of the variants are called in the sample (i.e. have a 0/1 or 1/1 genotype). If you are using the --pass-only option it might also be the case that all supported variants are filtered.

Illegal instruction (core dumped)

This issue may occur when you are trying to run the tensorflow-based prediction algorithms MHCnuggets and/or MHCflurry. This indicates that your computer’s hardware does not support the version of tensorflow that is installed. Downgrading tensorflow manually to version 1.5.0 (pip install tensorflow==1.5.0) should solve this problem.