Optional Downstream Analysis Tools¶
Generate Protein Fasta¶
usage: pvacseq generate_protein_fasta [-h] [--input-tsv INPUT_TSV] [-p PHASED_PROXIMAL_VARIANTS_VCF] [--mutant-only] [-d DOWNSTREAM_SEQUENCE_LENGTH] [-s SAMPLE_NAME] input_vcf flanking_sequence_length output_file Generate an annotated fasta file from a VCF with protein sequences of mutations and matching wildtypes positional arguments: input_vcf A VEP-annotated single- or multi-sample VCF containing genotype, transcript, Wildtype protein sequence, and Downstream protein sequence information.The VCF may be gzipped (requires tabix index). flanking_sequence_length Number of amino acids to add on each side of the mutation when creating the FASTA. output_file The output fasta file. optional arguments: -h, --help show this help message and exit --input-tsv INPUT_TSV A pVACseq all_epitopes or filtered TSV file with epitopes to use for subsetting the input VCF to peptides of interest. Only the peptide sequences for the epitopes in the TSV will be used when creating the FASTA. (default: None) -p PHASED_PROXIMAL_VARIANTS_VCF, --phased-proximal-variants-vcf PHASED_PROXIMAL_VARIANTS_VCF A VCF with phased proximal variant information to incorporate into the predicted fasta sequences. Must be gzipped and tabix indexed. (default: None) --mutant-only Only output mutant peptide sequences (default: False) -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH Cap to limit the downstream sequence length for frameshifts when creating the fasta file. Use 'full' to include the full downstream sequence. (default: 1000) -s SAMPLE_NAME, --sample-name SAMPLE_NAME The name of the sample being processed. Required when processing a multi-sample VCF and must be a sample ID in the input VCF #CHROM header line. (default: None)
This tool will extract protein sequences surrounding supported protein altering variants in an input VCF file. One use case for this tool is to help select long peptides that contain short neoepitope candidates. For example, if pvacseq was run to predict nonamers (9-mers) that are good binders and the user wishes to select long peptide (e.g. 24-mer) sequences that contain the nonamer for synthesis or encoding in a DNA vector. The protein sequence extracted will correspond to the transcript sequence used in the annotated VCF. The alteration in the VCF (e.g. a somtic missense SNV) will be centered in the protein sequence returned (if possible). If the variant is near the beginning or end of the CDS, it will be as close to center as possible while returning the desired protein sequence length. If the variant causes a frameshift, the full downstream protein sequence will be returned unless the user specifies otherwise as described above.
Generate Aggregated Report¶
usage: pvacseq generate_aggregated_report [-h] input_file output_file Generate an aggregated report from a pVACseq .all_epitopes.tsv report file. positional arguments: input_file A pVACseq .all_epitopes.tsv report file output_file The file path to write the aggregated report tsv to optional arguments: -h, --help show this help message and exit
This tool produces an aggregated version of the all_epitopes TSV. It finds the best-scoring (lowest binding affinity) epitope for each variant, and outputs additional binding affinity, expression, and coverage information for that epitope. It also gives information about the total number of well-scoring epitopes for each variant, the number of transcripts covered by those epitopes, as well as the HLA alleles that those epitopes are well-binding to. Lastly, the report will bin variants into tiers that offer suggestions as to the suitability of variants for use in vaccines. For a full definition of these tiers, see the pVACseq output file documentation.