pVACseq logo

Optional Downstream Analysis Tools

Generate Protein Fasta

usage: pvacseq generate_protein_fasta [-h] [--input-tsv INPUT_TSV]
                                      [--mutant-only]
                                      [-d DOWNSTREAM_SEQUENCE_LENGTH]
                                      input_vcf peptide_sequence_length
                                      output_file

positional arguments:
  input_vcf             A VEP-annotated single-sample VCF containing
                        transcript, Wildtype protein sequence, and Downstream
                        protein sequence information.
  peptide_sequence_length
                        Length of the peptide sequence to use when creating
                        the FASTA.
  output_file           The output fasta file.

optional arguments:
  -h, --help            show this help message and exit
  --input-tsv INPUT_TSV
                        A pVACseq all_epitopes or filtered TSV file with
                        epitopes to use for subsetting the input VCF to
                        peptides of interest. Only the peptide sequences for
                        the epitopes in the TSV will be used when creating the
                        FASTA. (default: None)
  --mutant-only         Only output mutant peptide sequences (default: False)
  -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
                        Cap to limit the downstream sequence length for
                        frameshifts when creating the fasta file. Use 'full'
                        to include the full downstream sequence. (default:
                        1000)

This tool will extract protein sequences surrounding supported protein altering variants in an input VCF file. One use case for this tool is to help select long peptides that contain short neoepitope candidates. For example, if pvacseq was run to predict nonamers (9-mers) that are good binders and the user wishes to select long peptide (e.g. 24-mer) sequences that contain the nonamer for synthesis or encoding in a DNA vector. The protein sequence extracted will correspond to the transcript sequence used in the annotated VCF. The alteration in the VCF (e.g. a somtic missense SNV) will be centered in the protein sequence returned (if possible). If the variant is near the beginning or end of the CDS, it will be as close to center as possible while returning the desired protein sequence length. If the variant causes a frameshift, the full downstream protein sequence will be returned unless the user specifies otherwise as described above.

Generate Condensed, Ranked Report

usage: pvacseq generate_condensed_ranked_report [-h] [-m {lowest,median}]
                                                input_file output_file

positional arguments:
  input_file            A pVACseq .all_epitopes.tsv or .filtered.tsv report
                        file
  output_file           The file path to write the condensed, ranked report
                        tsv to

optional arguments:
  -h, --help            show this help message and exit
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use for ranking epitopes by
                        binding-threshold and minimum fold change. lowest: Use
                        the best MT Score and Corresponding Fold Change (i.e.
                        the lowest MT ic50 binding score and corresponding
                        fold change of all chosen prediction methods). median:
                        Use the median MT Score and Median Fold Change (i.e.
                        the median MT ic50 binding score and fold change of
                        all chosen prediction methods). (default: median)

This tool will produce a condensed version of the filtered TSV with only the most important columns remaining, with a score for each neoepitope candidate added. Refer to the Output Files section for more details on the format of this report.