pVACseq logo

Optional Downstream Analysis Tools

Generate Protein Fasta

usage: pvacseq generate_protein_fasta [-h] [--input-tsv INPUT_TSV]
                                      [-p PHASED_PROXIMAL_VARIANTS_VCF]
                                      [--pass-only] [--mutant-only]
                                      [-d DOWNSTREAM_SEQUENCE_LENGTH]
                                      [-s SAMPLE_NAME]
                                      input_vcf flanking_sequence_length
                                      output_file

Generate an annotated fasta file from a VCF with protein sequences of
mutations and matching wildtypes

positional arguments:
  input_vcf             A VEP-annotated single- or multi-sample VCF containing
                        genotype, transcript, Wildtype protein sequence, and
                        Downstream protein sequence information.The VCF may be
                        gzipped (requires tabix index).
  flanking_sequence_length
                        Number of amino acids to add on each side of the
                        mutation when creating the FASTA.
  output_file           The output fasta file.

optional arguments:
  -h, --help            show this help message and exit
  --input-tsv INPUT_TSV
                        A pVACseq all_epitopes or filtered TSV file with
                        epitopes to use for subsetting the input VCF to
                        peptides of interest. Only the peptide sequences for
                        the epitopes in the TSV will be used when creating the
                        FASTA. (default: None)
  -p PHASED_PROXIMAL_VARIANTS_VCF, --phased-proximal-variants-vcf PHASED_PROXIMAL_VARIANTS_VCF
                        A VCF with phased proximal variant information to
                        incorporate into the predicted fasta sequences. Must
                        be gzipped and tabix indexed. (default: None)
  --pass-only           Only process VCF entries with a PASS status. (default:
                        False)
  --mutant-only         Only output mutant peptide sequences (default: False)
  -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
                        Cap to limit the downstream sequence length for
                        frameshifts when creating the fasta file. Use 'full'
                        to include the full downstream sequence. (default:
                        1000)
  -s SAMPLE_NAME, --sample-name SAMPLE_NAME
                        The name of the sample being processed. Required when
                        processing a multi-sample VCF and must be a sample ID
                        in the input VCF #CHROM header line. (default: None)

This tool will extract protein sequences surrounding supported protein altering variants in an input VCF file. One use case for this tool is to help select long peptides that contain short neoepitope candidates. For example, if pvacseq was run to predict nonamers (9-mers) that are good binders and the user wishes to select long peptide (e.g. 24-mer) sequences that contain the nonamer for synthesis or encoding in a DNA vector. The protein sequence extracted will correspond to the transcript sequence used in the annotated VCF. The alteration in the VCF (e.g. a somatic missense SNV) will be centered in the protein sequence returned (if possible). If the variant is near the beginning or end of the CDS, it will be as close to center as possible while returning the desired protein sequence length. If the variant causes a frameshift, the full downstream protein sequence will be returned unless the user specifies otherwise as described above.

Generate Aggregated Report

usage: pvacseq generate_aggregated_report [-h] [--tumor-purity TUMOR_PURITY]
                                          input_file output_file

Generate an aggregated report from a pVACseq .all_epitopes.tsv report file.

positional arguments:
  input_file            A pVACseq .all_epitopes.tsv report file
  output_file           The file path to write the aggregated report tsv to

optional arguments:
  -h, --help            show this help message and exit
  --tumor-purity TUMOR_PURITY
                        Value between 0 and 1 indicating the fraction of tumor
                        cells in the tumor sample. Information is used during
                        aggregate report creation for a simple estimation of
                        whether variants are subclonal or clonal based on VAF.
                        If not provided, purity is estimated directly from the
                        VAFs. (default: None)

This tool produces an aggregated version of the all_epitopes TSV. It finds the best-scoring (lowest binding affinity) epitope for each variant, and outputs additional binding affinity, expression, and coverage information for that epitope. It also gives information about the total number of well-scoring epitopes for each variant, the number of transcripts covered by those epitopes, as well as the HLA alleles that those epitopes are well-binding to. Lastly, the report will bin variants into tiers that offer suggestions as to the suitability of variants for use in vaccines. For a full definition of these tiers, see the pVACseq output file documentation.

Calculate Reference Proteome Similarity

usage: pvacseq calculate_reference_proteome_similarity [-h]
                                                       [--match-length MATCH_LENGTH]
                                                       [--species SPECIES]
                                                       [--blastp-path BLASTP_PATH]
                                                       [--blastp-db {refseq_select_prot,refseq_protein}]
                                                       [-t N_THREADS]
                                                       input_file input_fasta
                                                       output_file

Blast peptides against the reference proteome.

positional arguments:
  input_file            Input filtered or all_epitopes file with predicted
                        epitopes.
  input_fasta           For pVACbind, the original input FASTA file. For
                        pVACseq and pVACfuse a FASTA file with mutant peptide
                        sequences for each variant isoform. This file can be
                        found in the same directory as the input
                        filtered/all_epitopes file. Can also be generated by
                        running `pvacseq|pvacfuse generate_protein_fasta`.
  output_file           Output TSV filename for putative neoepitopes.

optional arguments:
  -h, --help            show this help message and exit
  --match-length MATCH_LENGTH
                        The desired matching epitope length. (default: 8)
  --species SPECIES     The species of the input file. (default: human)
  --blastp-path BLASTP_PATH
                        Blastp installation path. (default: None)
  --blastp-db {refseq_select_prot,refseq_protein}
                        The blastp database to use. (default:
                        refseq_select_prot)
  -t N_THREADS, --n-threads N_THREADS
                        Number of threads to use for parallelizing BLAST
                        calls. (default: 1)

This tool will Blast peptides against the relative reference proteome and return the results in an output TSV & reference_match file pair, given a pVACseq run’s fasta and filtered/all_epitopes TSV. Typically, this can be done as part of the pVACseq run pipeline for the filtered output TSV if specified. This tool, however, provides a standalone way to run this on pVACseq’s generated filtered/all_epitopes TSV files. For instance, this may be desired if pvacseq was originally run without this specified and one wished to perform this additional step after the fact for the filtered TSV—or perhaps instead the results of this were desired for the all_epitopes TSV which does not have this step performed. For a closer look at the generated reference_match file, see the pVACseq output file documentation.

NetChop Predict Cleavage Sites

usage: pvacseq net_chop [-h] [--method {cterm,20s}] [--threshold THRESHOLD]
                        input_file input_fasta output_file

Predict cleavage sites for neoepitopes.

positional arguments:
  input_file            Input filtered file with predicted epitopes.
  input_fasta           The required fasta file.
  output_file           Output tsv filename for putative neoepitopes.

optional arguments:
  -h, --help            show this help message and exit
  --method {cterm,20s}  NetChop prediction method to use ("cterm" for C term
                        3.0, "20s" for 20S 3.0). (default: cterm)
  --threshold THRESHOLD
                        NetChop prediction threshold. (default: 0.5)

This tool uses NetChop to predict cleavage sites for neoepitopes from a pVACseq run’s filtered/all_epitopes TSV. In its output, it adds to the TSV 3 columns: Best Cleavage Position, Best Cleavage Score, and a Cleavage Sites list. Typically this step is done in the pVACseq run pipeline for the filtered output TSV when specified. This tool provides a way to manually run this on pVACseq’s generated filtered/all_epitopes TSV files so that you can add this information when not present if desired. You can view more about these columns for pVACseq in the output file documentation.

NetMHCStab Predict Stability

usage: pvacseq netmhc_stab [-h] [-m {lowest,median}] input_file output_file

Add stability predictions to predicted neoepitopes.

positional arguments:
  input_file            Input filtered file with predicted epitopes.
  output_file           Output TSV filename for putative neoepitopes.

optional arguments:
  -h, --help            show this help message and exit
  -m {lowest,median}, --top-score-metric {lowest,median}
                        The ic50 scoring metric to use when sorting epitopes.
                        lowest: Use the best MT Score and Corresponding Fold
                        Change (i.e. the lowest MT ic50 binding score and
                        corresponding fold change of all chosen prediction
                        methods). median: Use the median MT Score and Median
                        Fold Change (i.e. the median MT ic50 binding score and
                        fold change of all chosen prediction methods).
                        (default: median)

This tool uses NetMHCstabpan to add stability predictions for neoepitopes from a pVACseq run’s filtered/all_epitopes TSV. In its output, it adds to the TSV 4 columns: Predicted Stability, Half Life, Stability Rank, and NetMHCStab Allele. Typically this step is done in the pVACseq run pipeline for the filtered output TSV when specified. This tool provides a way to manually run this on pVACseq’s generated filtered/all_epitopes TSV files so that you can add this information when not present if desired. You can view more about these columns for pVACseq in the output file documentation.