pVACseq logo

Adding genotype sample information to your VCF

pVACseq was primarily designed for clinical application. As such, it requires that the input VCF contains sample genotype information (GT field), which identifies whether or not a variant was called in a specific sample of interest.

Some variant callers (e.g., Strelka), however, do not include this field. In other use cases you might want to run pVACseq on a list of variants of interest. If your input VCF does not contain sample information (i.e. no FORMAT column and/or sample column) or the FORMAT list does not contain a GT field, you will need to preprocess your VCF to add this information.

This information can be added using the VAtools vcf-genotype-annotator.

Using the vcf-genotype-annotator to add genotype information to your VCF

Installing the vcf-genotype-annotator

The vcf-genotype-annotator is part of the vatools package. Please visit vatools.org for more details on this package. You can install this package by running:

pip install vatools

Running the vcf-genotype-annotator

Example vcf-genotype-annotator commands

vcf-genotype-annotator <input_vcf> <sample_name> 0/1 -o <gt_annotated_vcf>

The sample_name argument is used as the sample name in the #CHROM header of your VCF when adding a new sample with this tool. If you want to add a GT field to an existing sample in your VCF, this argument will need to match the name of that sample.

Please see the VAtools documentation for more information.