pVACseq logo

Adding expression data to your VCF

pVACseq is able to parse coverage and expression information directly from the VCF. The expected annotation format is outlined below.

Type

VCF Sample

Format Fields

Transcript Expression

single-sample VCF or sample_name

TX

Gene Expression

single-sample VCF or sample_name

GX

Transcript Expression

If the VCF is a single-sample VCF, pVACseq assumes that this sample is the tumor sample. If the VCF is a multi-sample VCF, pVACseq will look for the sample using the sample_name parameter and treat that sample as the tumor sample.

For this tumor sample the transcript expression is determined from the TX format field. The TX format field is a comma-separated list of per-transcript expression values, where each individual transcript expression is listed as expression_id|expression_value, e.g. ENST00000215794|2.35912,ENST00000215795|0.2. The expression_id needs to match the Feature field of the VEP CSQ annotation. In other words, your expression abundance estimation should have been performed with the same transcript annotation version that you used to annotate your variants with VEP (e.g. Ensembl v95).

Gene Expression

If the VCF is a single-sample VCF, pVACseq assumes that this sample is the tumor sample. If the VCF is a multi-sample VCF, pVACseq will look for the sample using the sample_name parameter and treat that sample as the tumor sample.

For this tumor sample the gene expression is determined from the GX format field. The GX format field is a comma-separated list of per-gene expression values, where each individual gene expression is listed as gene_id|expression_value, e.g. ENSG00000184979|2.35912. The gene_id needs to match the Gene field of the VEP CSQ annotation.

Using the vcf-expression-annotator to add expression information to your VCF

The vcf-expression-annotator will add expression information to your VCF. It will accept expression data from various tools. Currently it supports Cufflinks, Kallisto, StringTie, as well as a custom option for any tab-delimited file.

Installing the vcf-expression-annotator

The vcf-expression-annotator is part of the vatools package (vatools.org). You can install this package by running:

pip install vatools

Running the vcf-expression-annotator

You can now use the output file from your expression caller to add expression information to your VCF:

vcf-expression-annotator input_vcf expression_file kallisto|stringtie|cufflinks|custom gene|transcript

The data type gene or transcript identifies whether you are annotating transcript or gene expression data. Transcript expression annotations will be written to the TX format field while gene expression annotations will be written to the GX format field. Please see the VAtools documentation for more information.