The input to the pVACseq pipeline is a VEP annotated single-sample VCF. In addition to the standard VEP annotations, pVACseq also requires the annotations provided by the Downstream and Wildtype VEP plugins.
To create a VCF for use with pVACseq follow these steps:
- Download and install the VEP command line tool following these instructions.
- Download the VEP_plugins from their GitHub repository.
- Copy the Wildtype plugin provided with the pVACseq package to the folder with the other VEP_plugins:
- Run VEP on the input vcf with at least the following options:
--format vcf --vcf --symbol --plugin Downstream --plugin Wildtype --terms SO
--dir_plugins <VEP_plugins directory> option may need to be set depending on where the VEP_plugins were installed to.
--pick option might be useful to limit the annotation to the top
transcripts. Otherwise, VEP will annotate each variant with all possible
transcripts. pVACseq will provide predictions for all transcripts in the VEP
CSQ field. Running VEP without the
--pick option can therefor drasticly
increase the runtime of pVACseq.
Additional VEP options that might be desired can be found here.
Example VEP Command
perl variant_effect_predictor.pl \ --input_file <input VCF> --format vcf --output_file <output VCF> \ --vcf --symbol --terms SO --plugin Downstream --plugin Wildtype \ [--dir_plugins <VEP_plugins directory>]
Coverage and Expression Data¶
Coverage and expression data can be added to the pVACseq processing by providing bam-readcount and/or Cufflinks output files as additional input files. These additional input files must be provided as a yaml file in the following structure:
gene_expn_file: <genes.fpkm_tracking file from Cufflinks> transcript_expn_file: <isoforms.fpkm_tracking file from Cufflinks> normal_snvs_coverage_file: <bam-readcount output file for normal BAM and snvs> normal_indels_coverage_file: <bam-readcount output file for normal BAM and indels> tdna_snvs_coverage_file: <bam-readcount output file for tumor DNA BAM and snvs> tdna_indels_coverage_file: <bam-readcount output file for tumor DNA BAM and indels> trna_snvs_coverage_file: <bam-readcount output file for tumor RNA BAM and snvs> trna_indels_coverage_file: <bam-readcount output file for tumor RNA BAM and indels>
Each file in this list is optional, and its entry can be omitted. If no additional files exist then this yaml file is optional and can be omitted from the list of
pVACseq optionally accepts bam-readcount files as inputs to add coverage information (depth and VAF) for downstream filtering. Depth and VAF are calculated from the read counts of the reference allele and alternate allele.
Follow the installation instructions on the bam-readcount GitHub page.
bam-readcount uses a bam file and regions file as input, and the bam regions may either contain snvs or indels. Indel regions must be run in a special insertion-centric mode. Any mixed input regions must be split into snvs and indels, and bam-reacount must then be run on each file individually using the same bam.
Example bam-readcount command
bam-readcount -f <reference fasta> -l <site list> <bam_file>
-i option must be used when running indels bam in order to process indels in insertion-centric mode.
A minimum base quality of 20 is recommended which can be enabled by