Protein analysis techniques are central to understanding how cells function. From extracting and purifying proteins to identifying them by mass spectrometry, these methods reveal what proteins a cell is making, how they interact, and how they're modified. Genomics and transcriptomics techniques complement this by showing gene expression across the entire genome. Together, these "omics" approaches give you a systems-level view of cellular activity that no single experiment could provide.
Protein Analysis Techniques
Protein extraction and purification techniques
Before you can study proteins, you need to get them out of cells and separate the ones you care about from everything else.
Extraction starts with cell lysis, which breaks open cells to release their contents. Three main approaches:
- Mechanical disruption (sonication, homogenization) physically shears cells open
- Detergent-based lysis (Triton X-100, SDS) dissolves cell membranes
- Enzymatic digestion (lysozyme for bacterial cell walls, proteinase K for tissue) chemically breaks down structural barriers
Buffer selection matters here. You choose pH, salt concentration, and reducing agents based on your target protein's solubility and stability. A membrane protein needs detergent in the buffer; a soluble cytoplasmic protein usually doesn't.
Purification separates your protein of interest from the thousands of others in the lysate. The main chromatography techniques each exploit a different protein property:
- Size exclusion chromatography separates by molecular weight (larger proteins elute first)
- Ion exchange chromatography separates by charge (proteins bind to an oppositely charged resin and are eluted by increasing salt concentration)
- Affinity chromatography captures proteins using a specific ligand (e.g., a His-tag binding to a nickel column), giving very high specificity
Electrophoretic separation is used both for analysis and as a purification step:
- SDS-PAGE denatures proteins with SDS detergent, coating them with uniform negative charge, so they separate purely by molecular weight
- Native PAGE keeps proteins in their folded state, separating by a combination of size, shape, and charge
Protein quantification
Once you've extracted protein, you need to know how much you have. Several assays do this by producing a measurable signal proportional to protein concentration:
- Colorimetric assays are the workhorses. The Bradford assay uses Coomassie blue dye binding (quick, but incompatible with detergents like SDS). The Lowry assay uses the Folin-Ciocalteu reagent (more sensitive but slower). The BCA assay uses bicinchoninic acid (compatible with detergents, making it popular for membrane protein work).
- UV-Vis spectroscopy at 280 nm measures absorbance from aromatic amino acids (tryptophan and tyrosine). It's fast and doesn't consume sample, but accuracy depends on the protein's aromatic amino acid content.
- Fluorometric assays like Qubit and NanoOrange offer higher sensitivity and specificity, useful when you have very small amounts of protein.

Applications of protein analysis methods
Western blotting is the go-to method for detecting a specific protein in a complex mixture. The process has three key steps:
- Separate proteins by SDS-PAGE based on molecular weight
- Transfer (blot) the separated proteins onto a membrane (nitrocellulose or PVDF)
- Probe the membrane with a primary antibody specific to your target protein, then a secondary antibody conjugated to an enzyme or fluorophore for detection
Western blots are semi-quantitative. You can compare relative protein levels between conditions (e.g., treated vs. untreated cells), but they don't give you precise absolute amounts.
Immunoprecipitation (IP) uses antibody-coated beads to pull a specific protein out of a lysate. This is useful for:
- Isolating protein complexes: co-immunoprecipitation (co-IP) pulls down your target protein along with anything bound to it, revealing interaction partners
- Identifying post-translational modifications like phosphorylation or ubiquitination on the captured protein
Mass spectrometry (MS) is the most powerful tool for protein identification and quantification. Proteins are typically digested into peptides with trypsin, then ionized using one of two methods:
- MALDI (matrix-assisted laser desorption/ionization) embeds peptides in a crystalline matrix and uses a laser to ionize them
- ESI (electrospray ionization) sprays peptides from a liquid into a fine mist of charged droplets
From there, two identification strategies are common:
- Peptide mass fingerprinting matches the observed masses of peptides to theoretical masses from a protein database. This works well for identifying purified proteins.
- Tandem mass spectrometry (MS/MS) fragments individual peptides further, generating sequence information that enables de novo sequencing and more confident identification in complex mixtures.
Quantitative proteomics approaches let you compare protein levels across conditions:
- Label-free methods use spectral counting (how many times a peptide is detected) or peak intensity to estimate abundance
- Isotope labeling methods like SILAC (stable isotope labeling by amino acids in cell culture) incorporate heavy isotopes metabolically, while iTRAQ (isobaric tags for relative and absolute quantification) chemically labels peptides after extraction. Both allow multiplexed comparison of several conditions in a single MS run.
Genomics and Transcriptomics Techniques

DNA and RNA expression profiling
These techniques measure which genes are active and at what levels, giving you a snapshot of cellular state.
DNA microarrays were the first widely used high-throughput expression profiling tool. Thousands of oligonucleotide probes are fixed to a solid surface (glass slide or silicon chip). Fluorescently labeled cDNA or cRNA from your sample hybridizes to complementary probes, and fluorescence intensity reflects relative expression levels. Microarrays are still used for:
- Differential gene expression analysis between conditions or cell types
- Genotyping to detect genetic variations
- SNP analysis to identify single nucleotide polymorphisms across the genome
RNA sequencing (RNA-seq) has largely replaced microarrays for expression profiling. The workflow involves converting RNA to cDNA, building a sequencing library, and performing high-throughput sequencing. Read counts for each gene serve as a measure of transcript abundance.
RNA-seq has several advantages over microarrays:
- Higher sensitivity for detecting low-abundance transcripts
- Wider dynamic range, capturing expression levels across several orders of magnitude
- Better resolution, distinguishing closely related sequences, splice variants, and even novel transcripts or fusion genes that wouldn't appear on a pre-designed microarray
The trade-off is that RNA-seq generates far more data and requires more computational analysis.
Bioinformatics for omics data analysis
Raw omics data is meaningless without computational analysis. Bioinformatics tools turn millions of data points into biological insight.
Data preprocessing cleans up raw data before analysis:
- Quality control and filtering removes low-quality reads and adaptor sequences
- Normalization adjusts for technical differences like library size and sequencing depth so that samples are comparable
- Batch effect correction removes systematic biases introduced when samples are processed at different times or on different instruments
Functional annotation assigns biological meaning to your gene or protein lists. Gene Ontology (GO) terms categorize genes into three domains:
- Biological process (e.g., apoptosis, cell division)
- Molecular function (e.g., kinase activity, DNA binding)
- Cellular component (e.g., nucleus, mitochondrial membrane)
Pathway analysis tools map genes and proteins onto known biological pathways. KEGG, Reactome, and BioCyc are the most commonly used databases. This helps you move from "here are 500 differentially expressed genes" to "the MAPK signaling pathway is significantly upregulated."
Statistical analysis for differential expression uses specialized tools:
- DESeq2 and edgeR are designed for RNA-seq count data, using negative binomial models
- limma was originally built for microarray data but has been adapted for RNA-seq
Because you're testing thousands of genes simultaneously, multiple testing correction is essential. Without it, you'd expect hundreds of false positives by chance alone. The false discovery rate (FDR) method (Benjamini-Hochberg) is most commonly used because it balances sensitivity with controlling false positives. The Bonferroni correction is more conservative and appropriate when you need very strict control.
Data visualization helps you interpret and communicate results:
- Heatmaps display expression levels as a color-coded matrix, making it easy to spot clusters of co-regulated genes
- Volcano plots graph statistical significance against fold change, so you can quickly identify genes that are both highly changed and statistically significant
- PCA (principal component analysis) reduces high-dimensional data to two or three dimensions, revealing how samples cluster and whether replicates group together
Interaction networks and protein-protein interaction (PPI) databases like STRING and BioGRID let you visualize relationships between proteins, helping you identify hubs and functional modules within your dataset.
Public data repositories store published datasets for reuse and meta-analysis. GEO (Gene Expression Omnibus) and ArrayExpress host microarray and RNA-seq data. ProteomeXchange and PRIDE host proteomics datasets. These are valuable resources for validating your own findings or mining existing data for new hypotheses.