![]() |
OpenMS
|
Create a decoy peptide database from standard FASTA databases.
Decoy databases are useful to control false discovery rates and thus estimate score cutoffs for identified spectra.
The decoy can either be generated by reversing or shuffling each of the peptides of a sequence (as defined by a given enzyme). For reversing the N and C terminus of the peptides are kept in position by default.
To get a 'contaminants' database have a look at http://www.thegpm.org/crap/index.html or find/create your own contaminant database.
Multiple databases can be provided as input, which will internally be concatenated before being used for decoy generation. This allows you to specify your target database plus a contaminant file and obtain a concatenated target-decoy database using a single call, e.g., DecoyDatabase -in human.fasta crap.fasta -out human_TD.fasta
By default, a combined database is created where target and decoy sequences are written interleaved (i.e., target1, decoy1, target2, decoy2,...). If you need all targets before the decoys for some reason, use only_decoy
and concatenate the files externally.
The tool will keep track of all protein identifiers and report duplicates.
Also the tool automatically checks for decoys already in the input files (based on most common pre-/suffixes) and terminates the program if decoys are found.
Extra functionality: The Neighbor Peptide functionality (see subsection 'NeighborSearch') is designed to find peptides (neighbors) in a given set of sequences (FASTA file) that are similar to a target peptide (aka relevant peptide) based on mass and spectral characteristics. This provides more power when searching complex samples, but only a subset of the peptides/proteins is of interest. See www.ncbi.nlm.nih.gov/pmc/articles/PMC8489664/ and NeighborSeq for details.
The command line parameters of this tool are:
INI file documentation of this tool: