pVACfuse logo

Optional Downstream Analysis Tools

Generate Protein Fasta

usage: pvacfuse generate_protein_fasta [-h] [--input-tsv INPUT_TSV]
                                       [-d DOWNSTREAM_SEQUENCE_LENGTH]
                                       input_file flanking_sequence_length
                                       output_file

Generate an annotated fasta file from Integrate-Neo or AGFusion output.

positional arguments:
  input_file            An INTEGRATE-Neo annotated bedpe file with fusions or
                        a AGfusion output directory.
  flanking_sequence_length
                        Number of amino acids to add on each side of the
                        mutation when creating the FASTA.
  output_file           The output fasta file.

optional arguments:
  -h, --help            show this help message and exit
  --input-tsv INPUT_TSV
                        A pVACfuse all_epitopes or filtered TSV file with
                        epitopes to use for subsetting the input file to
                        peptides of interest. Only the peptide sequences for
                        the epitopes in the TSV will be used when creating the
                        FASTA. (default: None)
  -d DOWNSTREAM_SEQUENCE_LENGTH, --downstream-sequence-length DOWNSTREAM_SEQUENCE_LENGTH
                        Cap to limit the downstream sequence length for
                        frameshift fusion when creating the fasta file. Use
                        'full' to include the full downstream sequence.
                        (default: 1000)

This tool will extract protein sequences surrounding fusion variant in an by parsing Integrate-Neo or AGFusion output. One use case for this tool is to help select long peptides that contain short neoepitope candidates. For example, if pvacfuse was run to predict nonamers (9-mers) that are good binders and the user wishes to select long peptide (e.g. 24-mer) sequences that contain the nonamer for synthesis or encoding in a DNA vector. The fusion position will be centered in the protein sequence returned (if possible). If the fusion causes a frameshift, the full downstream protein sequence will be returned unless the user specifies otherwise as described above.