Optional Downstream Analysis Tools¶
Generate Protein Fasta¶
usage: pvacseq generate_protein_fasta [-h] [--input-tsv INPUT_TSV]
[--mutant-only]
[-d DOWNSTREAM_SEQUENCE_LENGTH]
input_vcf peptide_sequence_length
output_file
positional arguments:
input_vcf A VEP-annotated single-sample VCF containing
transcript, Wildtype protein sequence, and Downstream
protein sequence information.
peptide_sequence_length
Length of the peptide sequence to use when creating
the FASTA.
output_file The output fasta file.
optional arguments:
-h, --help show this help message and exit
--input-tsv INPUT_TSV
A pVACseq all_epitopes or filtered TSV file with
epitopes to use for subsetting the input VCF to
peptides of interest. Only the peptide sequences for
the epitopes in the TSV will be used when creating the
FASTA. (default: None)
--mutant-only Only output mutant peptide sequences (default: False)
-d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
Cap to limit the downstream sequence length for
frameshifts when creating the fasta file. Use 'full'
to include the full downstream sequence. (default:
1000)
This tool will extract protein sequences surrounding supported protein altering variants in an input VCF file. One use case for this tool is to help select long peptides that contain short neoepitope candidates. For example, if pvacseq was run to predict nonamers (9-mers) that are good binders and the user wishes to select long peptide (e.g. 24-mer) sequences that contain the nonamer for synthesis or encoding in a DNA vector. The protein sequence extracted will correspond to the transcript sequence used in the annotated VCF. The alteration in the VCF (e.g. a somtic missense SNV) will be centered in the protein sequence returned (if possible). If the variant is near the beginning or end of the CDS, it will be as close to center as possible while returning the desired protein sequence length. If the variant causes a frameshift, the full downstream protein sequence will be returned unless the user specifies otherwise as described above.
Generate Condensed, Ranked Report¶
usage: pvacseq generate_condensed_ranked_report [-h] [-m {lowest,median}]
input_file output_file
positional arguments:
input_file A pVACseq .all_epitopes.tsv or .filtered.tsv report
file
output_file The file path to write the condensed, ranked report
tsv to
optional arguments:
-h, --help show this help message and exit
-m {lowest,median}, --top-score-metric {lowest,median}
The ic50 scoring metric to use for ranking epitopes by
binding-threshold and minimum fold change. lowest: Use
the best MT Score and Corresponding Fold Change (i.e.
the lowest MT ic50 binding score and corresponding
fold change of all chosen prediction methods). median:
Use the median MT Score and Median Fold Change (i.e.
the median MT ic50 binding score and fold change of
all chosen prediction methods). (default: median)
This tool will produce a condensed version of the filtered TSV with only the most important columns remaining, with a score for each neoepitope candidate added. Refer to the Output Files section for more details on the format of this report.