We developed a novel method for de novo protein sequencing, called Database Independent Protein Sequecing (DiPS), published in MCP, 2017.


Sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput.

Our method is used for unambiguous, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named “Peptide Tag Assembler.”

Please see our paper published in MCP 2017.

Sample Preparation

The pure/enriched protein is denatured using 8M urea, reduced by dithiothreitol (DTT) and alkylated using iodoacetamide (IAA) or iodoacetate (IAc). Buffer is then exchanged to water on 3KDa molecular cutoff filters. The dissolved protein is then transferred to a glass vial and HCl is added to a final concentration of 3M. The mixture is microwaved for 4 minutes, and resulting MS amenable peptides are enriched from the hydrolysate and desalted using solid phase extraction. It is also recommended to use a small fraction of the original protein sample for a standard tryptic digestion.


Resulting petides are analyzed by nLC-MS/MS using a 3H gradient. Multiple runs of the same sample can be performed to increase sequence coverage, including different normalized collision energies (NCE), different fragmentation methods (e.g. CID, ETD, EThcD), and repeated runs using an exclusion list containing all peptides identified in the first analysis.

De-novo sequencing

Raw data is analyzed using the “De-novo” module of the PEAKS 7.0 software. If IAA used for alkylation prior to hydrolysis, analysis parameters include no enzyme specificity and no fixed modifications. Variable modifications include methionine oxidation, cysteine carbamidomethylation, cysteine carboxymethylation, and arginine citrullination. If IAc is used for alkylation, the de novo parameters include no enzyme specificity, fixed modification of cysteine carboxymethylation, and variable modifications of methionine oxidation, and arginine citrullination. If specific modifications are expected, they can be added to the de-novo parameters accordingly. For example, for monoclonal antibody sequencing, glutamine to pyroglutamate conversion can be included as a variable modification. For tryptic digest analysis, trypsin enzyme specificity is defined, as well as fixed cysteine carbamidomethylation (IAA) or carboxymethylation (IAc), and variable modification of methionine oxidation.

                  Once the peptide de-novo analysis is done, the “De novo peptide.csv” file is exported from PEAKS without any filtering (ALC=0). This is the input file for pTA. More than one file can be used as input for pTA.

                  Other de novo sequencing software or other versions of PEAKS may be used to generate de novo peptide sequences, but the .csv file must be formatted to match the PEAKS 7.0 format. An example format file is can be found here.


The pTA executable is avaiable here:

Version 1.0.6