upgrade
upgrade

🧬Proteomics

Key Techniques in Protein Sequencing

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Protein sequencing sits at the heart of proteomics—you can't understand what a protein does until you know what it's made of. These techniques connect directly to larger course concepts like structure-function relationships, post-translational modifications, and protein identification in complex biological systems. When you're analyzing a disease biomarker or characterizing an enzyme, the sequencing method you choose determines what information you can extract and how confident you can be in your results.

You're being tested on more than just definitions here. Exam questions will ask you to compare methods based on their mechanisms, explain why one technique works better than another for a given sample, and connect sequencing data to downstream applications. Don't just memorize what each technique does—know why it works, what its limitations are, and when you'd choose it over alternatives.


Chemical Degradation Methods

These classical approaches break down proteins systematically using chemical reactions, releasing amino acids one at a time for identification. The chemistry targets specific reactive groups on amino acids, allowing sequential determination of the primary structure.

Edman Degradation

  • Sequential N-terminal cleavage—removes and identifies one amino acid at a time from the protein's N-terminus using a cyclic chemical process
  • Phenylisothiocyanate (PITC) labels the terminal amino acid, forming a phenylthiohydantoin (PTH) derivative that can be identified by chromatography
  • Limited to ~50 residues due to cumulative incomplete reactions; larger proteins require fragmentation first

N-Terminal Sequencing

  • Determines the starting sequence of a protein, critical for confirming protein identity and detecting signal peptide cleavage
  • Edman degradation or mass spectrometry can both accomplish this, with MS offering higher sensitivity for blocked N-termini
  • Reveals post-translational modifications like acetylation that block the N-terminus and prevent Edman chemistry

C-Terminal Sequencing

  • Identifies the protein's end residues—technically challenging because no equivalent to Edman chemistry exists for the C-terminus
  • Carboxypeptidase enzymes sequentially remove C-terminal residues, but timing must be carefully controlled
  • Complementary to N-terminal data when confirming full-length protein expression or detecting C-terminal processing

Compare: N-terminal vs. C-terminal sequencing—both reveal terminus identity, but N-terminal methods (especially Edman) are far more established and reliable. C-terminal approaches often require mass spectrometry for accuracy. If an exam asks about sequencing challenges, blocked or modified termini are your go-to examples.


Mass Spectrometry-Based Approaches

Mass spectrometry revolutionized protein sequencing by measuring peptide masses with extraordinary precision. Ionized molecules are separated by their mass-to-charge ratio (m/zm/z), and fragmentation patterns reveal sequence information.

Mass Spectrometry-Based Sequencing

  • Measures m/zm/z ratios of ionized peptides to determine molecular weight with high accuracy (often sub-dalton precision)
  • Handles complex mixtures that would overwhelm chemical methods—thousands of proteins can be analyzed simultaneously
  • Requires complementary techniques like chromatographic separation and database searching for complete protein identification

Tandem Mass Spectrometry (MS/MS)

  • Two-stage analysis—the first mass analyzer selects a precursor ion, which is then fragmented and analyzed in the second stage
  • Fragmentation occurs at peptide bonds, generating b-ions (N-terminal fragments) and y-ions (C-terminal fragments) that reveal sequence
  • Gold standard for proteomics identification and quantification; enables confident sequence assignment from complex samples

De Novo Peptide Sequencing

  • Determines sequence without database reference—essential for novel proteins, species without sequenced genomes, or unexpected variants
  • Computational algorithms interpret MS/MS fragmentation patterns to reconstruct the amino acid order
  • Challenging for distinguishing isobaric residues like leucine and isoleucine (both m/zm/z = 113), which have identical masses

Compare: Database searching vs. de novo sequencing—both use MS/MS data, but database searching matches spectra to known sequences (faster, more confident) while de novo reconstructs sequences from scratch (necessary for unknowns). FRQs may ask when each approach is appropriate.


Sample Preparation Techniques

Before sequencing can occur, proteins must be prepared properly. Enzymatic digestion and chromatographic separation convert complex samples into analyzable peptide mixtures.

Protein Digestion Methods (Trypsin Digestion)

  • Trypsin cleaves after lysine (K) and arginine (R)—these basic residues occur frequently, producing peptides ideal for MS analysis (typically 7-20 residues)
  • Predictable cleavage sites allow computational prediction of expected peptide masses, enabling database matching
  • Alternative enzymes like chymotrypsin (cleaves after aromatic residues) or Lys-C (lysine only) provide complementary coverage

Liquid Chromatography-Mass Spectrometry (LC-MS)

  • Separates peptides by hydrophobicity before MS analysis, reducing sample complexity and ion suppression effects
  • Reversed-phase chromatography is standard—peptides elute from a hydrophobic column as organic solvent concentration increases
  • Enables deep proteome coverage—modern LC-MS can identify >10,000 proteins from a single sample

Compare: Trypsin vs. alternative proteases—trypsin produces peptides with C-terminal basic residues that ionize well in positive mode MS, making it the default choice. Other enzymes are used when trypsin misses regions or when specific cleavage patterns are needed for complete sequence coverage.


Computational and Indirect Methods

Modern protein sequencing increasingly relies on computational tools and indirect approaches that leverage DNA sequence data. Algorithms match experimental data to theoretical predictions or translate nucleotide sequences into amino acid sequences.

Database Searching and Peptide Matching

  • Compares experimental MS/MS spectra against theoretical fragmentation patterns calculated from protein databases
  • Search algorithms (like SEQUEST, Mascot, or MaxQuant) score matches based on how well observed and predicted spectra align
  • False discovery rate (FDR) control ensures statistical confidence—typically requiring <1% FDR for reported identifications

Sanger Sequencing

  • Sequences DNA, not protein directly—determines nucleotide order using chain-terminating dideoxynucleotides (ddNTPs)
  • Genetic code translation converts the DNA sequence to predicted amino acid sequence, assuming correct reading frame
  • Cannot detect post-translational modifications—the protein may differ from what the gene predicts due to processing, splicing, or chemical modifications

Compare: Direct protein sequencing (Edman, MS/MS) vs. DNA-based prediction (Sanger)—direct methods reveal the actual protein sequence including modifications, while DNA sequencing only predicts the encoded sequence. This distinction matters for identifying processed or modified proteins.


Quick Reference Table

ConceptBest Examples
Chemical/sequential degradationEdman degradation, N-terminal sequencing
Mass-based identificationMS/MS, LC-MS, de novo sequencing
Terminus-specific analysisN-terminal sequencing, C-terminal sequencing
Sample preparationTrypsin digestion, LC separation
Computational identificationDatabase searching, de novo algorithms
Indirect sequencingSanger sequencing (DNA-based prediction)
Complex mixture analysisLC-MS, MS/MS, database searching
Novel protein discoveryDe novo sequencing

Self-Check Questions

  1. Which two techniques both determine amino acid sequence but differ in whether they require a reference database? What situation would favor each approach?

  2. Compare Edman degradation and tandem mass spectrometry—what fundamental principle does each use to determine sequence, and why has MS/MS largely replaced Edman for most applications?

  3. A researcher discovers a protein with a blocked N-terminus. Which sequencing approaches would still work, and which would fail? Explain the chemical basis for this limitation.

  4. Why is trypsin the most commonly used protease for MS-based proteomics? How do the properties of tryptic peptides benefit mass spectrometry analysis?

  5. FRQ-style: A lab identifies a protein by database searching but suspects it contains a post-translational modification not in the database. Describe two approaches they could use to characterize this modification, and explain why Sanger sequencing of the gene would be insufficient.