Molecular Data for Evolution
Molecular evidence for evolution reveals species relationships and evolutionary timelines by comparing DNA and protein sequences across organisms. These comparisons, combined with molecular clock techniques, have reshaped how scientists reconstruct evolutionary history, often confirming and sometimes overturning conclusions drawn from fossils and anatomy alone.
DNA and Protein Sequence Comparisons
The core logic is straightforward: species that share a recent common ancestor will have more similar DNA and protein sequences than species whose lineages diverged long ago. Mutations accumulate over time, so the longer two lineages have been separated, the more genetic differences you'll find between them.
- Humans and chimpanzees share roughly 98–99% of their DNA sequences, reflecting a relatively recent common ancestor (around 6–7 million years ago)
- Humans and mice share less DNA similarity because their lineages diverged much earlier, around 80–90 million years ago
- Conserved sequences are stretches of DNA that remain highly similar across very different species because they code for proteins essential to survival. Changes in these regions tend to be harmful, so natural selection weeds them out.
- The gene for cytochrome c, a protein critical to cellular respiration, is highly conserved from yeast to humans. The fact that such distant organisms share nearly identical versions of this gene is strong evidence for common ancestry.
Comparative Genomics and Phylogenetic Trees
Comparative genomics involves analyzing entire genome sequences from different organisms to identify shared genes, conserved regions, and patterns of change. This goes beyond comparing single genes and gives a much fuller picture of evolutionary history.
Molecular data are used to build phylogenetic trees, branching diagrams that visually represent evolutionary relationships. On these trees:
- Each branching point (node) represents a common ancestor where two lineages split
- Branch length typically represents the amount of genetic change between lineages. Longer branches mean more accumulated mutations and greater evolutionary distance.
- A phylogenetic tree of primates built from DNA data can show not just that humans and gorillas are related, but approximately when their lineages diverged
Comparing mammalian genomes has identified conserved genomic regions shared across species and clarified how different mammalian orders are related to one another.
Molecular Clocks for Timing

Neutral Theory and Types of Molecular Clocks
A molecular clock estimates when evolutionary events occurred based on the rate at which genetic changes accumulate. The underlying idea comes from the neutral theory of molecular evolution, proposed by Motoo Kimura in 1968. This theory holds that most mutations at the molecular level are selectively neutral, meaning they neither help nor harm the organism. Because neutral mutations aren't subject to natural selection, they accumulate at a roughly steady rate over time, like ticks of a clock.
There are several types of molecular clocks, each suited to different questions:
- Strict molecular clock: Assumes genetic change happens at a constant rate across all lineages. This is the simplest model but often unrealistic.
- Relaxed molecular clock: Allows the rate of change to vary among lineages. This accounts for real-world differences in generation time, metabolic rate, and population size that affect mutation rates.
- Mitochondrial DNA (mtDNA) clock: Based on the faster mutation rate of mtDNA compared to nuclear DNA. Because mutations accumulate quickly in mtDNA, this clock is especially useful for studying recent events, such as human migrations and population splits within the past few hundred thousand years.
- Protein clock: Tracks amino acid substitutions in proteins, which accumulate more slowly than DNA mutations. This makes it better suited for studying deep evolutionary divergences, like the split between major animal phyla hundreds of millions of years ago.
Calibration and Limitations of Molecular Clocks
A molecular clock on its own only tells you relative timing. To convert genetic differences into actual dates, you need calibration points from the fossil record or well-established geological events.
For example, the fossil record places the divergence of birds and mammals at roughly 310 million years ago. Scientists can use that date as an anchor, then calculate mutation rates and estimate other divergence times from there.
Limitations to keep in mind:
- Different calibration points and clock models can produce different estimates. Studies of the human-chimpanzee divergence have yielded dates ranging from about 4 to 8 million years ago.
- Evolutionary rates aren't perfectly constant. Factors like changes in population size, shifts in selective pressure, or differences in generation time can speed up or slow down the clock.
- Horizontal gene transfer (the movement of genetic material between species, common in bacteria) can distort clock estimates when they're based on a single gene, because the gene's history may not match the species' history.
- Incomplete lineage sorting, where ancestral genetic variation persists through multiple speciation events, can also complicate the picture. If two genes diverged before the species carrying them diverged, the gene tree won't match the species tree, leading to misleading date estimates.
Molecular Phylogenetics for Trees

Constructing Evolutionary Trees
Molecular phylogenetics uses DNA or protein sequence data to infer evolutionary relationships and common ancestry. The resulting phylogenetic trees are branching diagrams where each fork represents a point where one lineage split into two.
Building these trees involves several steps:
-
Collect sequence data from the species you want to compare (a specific gene, multiple genes, or whole genomes)
-
Align the sequences so that corresponding positions can be compared across species. This step matches up homologous nucleotide or amino acid positions, accounting for insertions and deletions that have occurred since the sequences diverged.
-
Apply a statistical method to evaluate which tree structure best fits the data. Common methods include:
- Maximum likelihood: Tests many possible tree arrangements and selects the one most probable given the data and a model of how sequences evolve
- Bayesian inference: Uses probability theory to estimate the most likely tree while incorporating prior knowledge about evolutionary rates
-
Assess reliability, often through techniques like bootstrapping, which resamples the data many times to test how consistently each branch of the tree is supported. A bootstrap value above 70% is generally considered decent support; above 95% is strong.
Reliability and Integration of Phylogenetic Evidence
The quality of a phylogenetic analysis depends on several factors:
- Data quantity: Trees based on multiple genes or whole genomes are generally more reliable than single-gene trees, because any one gene might have its own unusual evolutionary history
- Choice of genetic markers: Some regions of DNA evolve faster than others, so the markers you choose should match the timescale you're studying. Fast-evolving markers work for recent divergences; slow-evolving ones work for ancient splits.
- Model assumptions: The evolutionary model used in the analysis affects the results. Choosing an inappropriate model can lead to inaccurate trees.
Molecular phylogenetics has resolved relationships that morphology and fossils alone couldn't settle. For example, molecular data clarified how the major groups of eukaryotes (plants, animals, fungi) are related, showing that fungi are more closely related to animals than to plants. This was a genuine surprise, since fungi look nothing like animals, but their protein and ribosomal RNA sequences tell a different story.
The strongest conclusions come from integrating molecular evidence with other lines of evidence:
- Comparative anatomy can confirm or challenge groupings suggested by molecular data
- Biogeography (the geographic distribution of species) can help explain patterns of divergence
- Fossil evidence provides calibration points and physical confirmation of when lineages existed
Combining these approaches gives a more complete and reliable picture of evolutionary history than any single method alone.
Molecular Techniques in Evolution
PCR and DNA Sequencing
Two techniques have been especially transformative for molecular evolutionary studies:
Polymerase Chain Reaction (PCR) amplifies specific DNA sequences from tiny or degraded samples. This is critical because researchers often work with limited material.
- PCR can amplify DNA from a single cell or from ancient, partially degraded specimens
- It allows researchers to target specific genetic markers (like mitochondrial genes) for focused evolutionary comparisons
- Before PCR, studying the genetics of rare or extinct species was nearly impossible
DNA sequencing determines the exact order of nucleotides in a DNA molecule. Two generations of technology matter here:
- Sanger sequencing (developed by Frederick Sanger in the late 1970s) was the first widely used method and was instrumental in early gene and genome sequencing projects, including the Human Genome Project
- Next-generation sequencing (NGS), emerging in the mid-2000s, dramatically increased speed and reduced cost, making it feasible to sequence entire genomes from many species. The sequencing of the human genome and other primate genomes revealed the genetic basis of human-specific traits and clarified primate evolutionary history.
Expanding the Scope of Evolutionary Studies
Large-scale genetic datasets have driven the development of computational tools for analyzing sequence data:
- BLAST (Basic Local Alignment Search Tool) lets researchers search databases to find sequences similar to a query sequence across thousands of species
- MEGA (Molecular Evolutionary Genetics Analysis) provides tools for sequence alignment, phylogenetic tree construction, and molecular clock analysis
These techniques have opened up entirely new areas of study:
- Ancient DNA (aDNA) extracted from fossils has revealed the evolutionary history and genetic diversity of extinct species like woolly mammoths and Neanderthals. Svante Pääbo's lab, which won the 2022 Nobel Prize in Physiology or Medicine for this work, identified interbreeding events between Neanderthals and early modern humans. Most people of non-African descent carry roughly 1–4% Neanderthal DNA as a result.
- Environmental DNA (eDNA), collected from soil or water samples, allows scientists to detect and study species without ever observing them directly. This has proven valuable for tracking biodiversity and identifying species in ecosystems that are difficult to survey.
Integrating molecular data with morphological and ecological evidence continues to deepen our understanding of how evolution operates across different scales, from individual adaptations shaped by natural selection to the large-scale diversification of entire lineages.