This lab uses a real bioinformatics tool called BLAST (Basic Local Alignment Search Tool) to compare DNA sequences across species and draw conclusions about how closely related those species are. You are not just reading about evolution here. You are actually doing the kind of molecular analysis that working biologists use to build evolutionary trees.

Why This Lab Matters for the AP Exam
The AP exam will ask you to interpret phylogenetic trees, evaluate different types of evolutionary evidence, and explain why molecular data is often more reliable than morphological data alone. This lab gives you hands-on experience with all three of those skills. Free-response questions frequently ask students to justify evolutionary relationships using sequence data, and this lab is exactly the kind of context those questions are built around.
CED Connections
This lab directly supports two topics in Unit 7.
Topic 7.6: Evidence of Evolution
- LO 7.6.A / EK 7.6.A.1: Evolution is supported by evidence from multiple disciplines. BLAST gives you biochemical evidence in the form of DNA and protein sequence comparisons, which sits alongside geological and morphological evidence as a valid line of support.
- LO 7.6.B / EK 7.6.B.1: Molecular, morphological, and genetic evidence from living and extinct organisms all contribute to our understanding of evolution. The lab asks you to compare these types of evidence directly.
- LO 7.6.B / EK 7.6.B.2: Comparing DNA nucleotide sequences and protein amino acid sequences provides evidence for evolution and common ancestry. This is the core of what BLAST does.
Topic 7.9: Phylogeny
- LO 7.9.A / EK 7.9.A.1, A.2, A.3: Phylogenetic trees and cladograms are hypotheses about evolutionary relationships. You will use BLAST output to build or evaluate these hypotheses, and you will practice identifying the outgroup and shared derived characters.
- LO 7.9.B / EK 7.9.B.1, B.2, B.3: You will use sequence similarity data to construct or interpret trees, identify nodes as most recent common ancestors, and recognize that these trees are always subject to revision as new evidence comes in.
What You Need to Be Able to Do
Here are the concrete skills this lab builds:
- Run a BLAST search by inputting a DNA or protein sequence and interpreting the results table (percent identity, E-value, alignment score)
- Compare sequence similarity across multiple species and use those comparisons to rank evolutionary relatedness
- Construct or interpret a cladogram or phylogenetic tree from molecular data, correctly placing nodes and identifying the outgroup
- Distinguish between a cladogram and a phylogenetic tree, specifically that phylogenetic trees include time scale or branch length while cladograms do not
- Evaluate molecular data vs. morphological data and explain why molecular evidence is often considered more reliable
- Write a claim-evidence-reasoning (CER) response that uses BLAST percent identity or sequence alignment data as evidence for a specific evolutionary relationship
- Connect molecular evidence to other lines of evidence including fossil evidence, morphological homologies, and the molecular clock
Core Concepts
Molecular Phylogenetics
Molecular phylogenetics is the practice of using DNA, RNA, or protein sequences to figure out how species are related to each other. The core idea is simple: species that share a more recent common ancestor will have more similar sequences, because they have had less time to accumulate mutations since they diverged.
BLAST is one tool used in this process. It takes a sequence you provide and searches a database to find the most similar sequences from other organisms. The output tells you how closely your sequence matches others, which you can use to infer evolutionary relatedness.
DNA Sequences and Sequence Comparison
DNA sequences are the specific order of nucleotide bases (A, T, C, G) in a segment of DNA. When two species share a high percentage of identical bases in the same gene, that similarity is evidence of shared ancestry. The more similar the sequences, the more recently those species likely diverged from a common ancestor.
Researchers often use specific genes for these comparisons. Mitochondrial DNA (mtDNA) is commonly used because it mutates at a relatively consistent rate and is inherited maternally without recombination. This makes it useful for tracing lineages. Ribosomal RNA genes (like 16S or 18S rRNA) are used to compare very distantly related organisms because they are highly conserved.
The Molecular Clock
The molecular clock is the idea that mutations accumulate in DNA at a roughly constant rate over time. If you know how fast a particular gene mutates, you can estimate how long ago two species diverged based on how different their sequences are. Molecular clocks are often calibrated using fossil evidence to anchor the timeline.
This is different from fossil dating methods like radiometric dating (carbon-14, potassium-argon), which use the decay of isotopes to determine the age of a fossil directly. Both approaches give you time-based evidence, but the molecular clock works from living DNA rather than preserved remains.
Phylogenetic Trees vs. Cladograms
A phylogenetic tree shows evolutionary relationships and includes information about the amount of change over time or the time scale itself. Branch lengths can represent either time or the degree of genetic change.
A cladogram also shows evolutionary relationships, but it does not convey time scale or the degree of evolutionary difference between groups. It only shows the pattern of branching, meaning which groups share a more recent common ancestor than others.
Both are hypotheses, not facts. They represent our best current interpretation of the available evidence, and they get revised when new data comes in (EK 7.9.B.3).
Nodes, Outgroups, and Shared Derived Characters
Each node on a tree represents the most recent common ancestor of the lineages branching from it. The further back a node is, the more distantly related those groups are.
The outgroup is the species or lineage that is least closely related to all the others in your analysis. You use it as a reference point. Traits shared by the outgroup and one ingroup species are likely ancestral, not derived. Traits found only in ingroup species are more likely to be shared derived characters (synapomorphies), which are the informative ones for building trees.
Morphological Data vs. Molecular Data
Morphological data refers to physical traits, things like body structure, bone arrangement, or organ presence. Morphological homologies (like the pentadactyl limb shared by mammals, birds, and reptiles) are strong evidence of common ancestry.
However, morphology can be misleading. Convergent evolution happens when unrelated species independently evolve similar traits because they face similar environmental pressures. Dolphins and sharks both have streamlined bodies and fins, but they are not closely related. Molecular data cuts through this problem because DNA similarity is much harder to fake through convergence. This is why EK 7.9.A.3.ii states that molecular data typically provides more accurate and reliable evidence than morphological traits.
How the Lab Works
The investigation is built around a central question: can you use DNA sequence comparisons to determine how closely related a set of species are, and does that match up with what morphological or fossil evidence suggests?
You start with a DNA or protein sequence from one organism and run it through BLAST. The tool returns a ranked list of matches from other species, along with statistics that tell you how similar each match is. The key output to focus on is percent identity, which tells you what fraction of the bases in the aligned region are identical between your query and the match.
From there, you compare multiple species to each other, not just one pair. You are building a picture of relative relatedness. If species A shares 98% identity with species B but only 82% identity with species C, that tells you A and B are more closely related to each other than either is to C.
Once you have your similarity data, you use it to construct or evaluate a cladogram or phylogenetic tree. You decide where to place the outgroup, where the nodes go, and which species cluster together. Then you compare your molecular tree to what morphological data or fossil evidence would suggest. Sometimes they agree. Sometimes they do not, and that disagreement is actually scientifically interesting because it might point to convergent evolution or gaps in the fossil record.
The lab also asks you to think critically about what your data means. A BLAST result is not proof of a relationship. It is evidence that supports a hypothesis, and like all hypotheses in phylogenetics, it can be revised.
Data and Analysis Moves
Reading BLAST Output
When you get BLAST results, focus on these values:
- Percent identity: the percentage of nucleotides (or amino acids) that are identical between your query and the match. Higher percent identity = more similar sequences = more closely related.
- E-value: a statistical measure of how likely the match is to occur by chance. A very small E-value (like 0.0 or 1e-50) means the match is highly significant, not random.
- Alignment score (bit score): higher scores indicate better matches.
For this lab, percent identity is usually the most useful number for comparing relatedness across species.
Building a Similarity Matrix
Once you have BLAST results for multiple species, organize your data into a similarity matrix. This is a table where each row and column is a species, and each cell shows the percent identity between that pair. It makes it much easier to see which species cluster together.
| Species A | Species B | Species C | Outgroup | |
|---|---|---|---|---|
| Species A | 100% | 97% | 84% | 71% |
| Species B | 97% | 100% | 83% | 70% |
| Species C | 84% | 83% | 100% | 72% |
| Outgroup | 71% | 70% | 72% | 100% |
From this table, you can see that A and B are most closely related, C is more distantly related to both, and the outgroup is least related to all of them.
Constructing Your Cladogram
Use your similarity matrix to place species on a cladogram:
- The outgroup goes on the first branch, separated from all ingroup species.
- The two most similar species (highest percent identity) share the most recent common ancestor, so they branch together last.
- Work outward from there based on decreasing similarity.
Remember: nodes represent common ancestors, not the organisms themselves. The node connecting A and B represents the ancestor they share. The node connecting (A+B) with C represents an older ancestor shared by all three.
Comparing Molecular and Morphological Evidence
A key analysis move in this lab is comparing what your molecular tree says to what morphological data suggests. If a species looks physically similar to another but your BLAST data shows low sequence identity, that is a signal worth investigating. It could mean convergent evolution, where similar traits evolved independently rather than from a shared ancestor.
Controls and Variables
- The controlled variable is the gene or sequence region you are comparing. You need to compare the same gene across all species for the data to be meaningful.
- The independent variable is the species being compared.
- The dependent variable is the percent identity (or sequence similarity score).
Connecting to the Molecular Clock
If your lab includes time estimates, you can use the molecular clock concept to estimate divergence times. If you know the mutation rate for the gene you are using, and you know the percent difference between two sequences, you can calculate roughly how long ago those lineages split. Fossil calibration points help anchor these estimates to real time.
Common Mistakes
Confusing similarity with identity. High percent identity means the sequences are nearly the same at those positions. It does not mean the organisms are identical or even that they look alike. Molecular similarity is about shared ancestry, not shared appearance.
Mixing up cladograms and phylogenetic trees. On the AP exam, this distinction matters. A cladogram shows branching pattern only. A phylogenetic tree adds information about time or the amount of evolutionary change. Do not use these terms interchangeably.
Misreading nodes. Students often think the node itself represents a living species. It does not. A node represents a hypothetical common ancestor, which may be extinct. The organisms at the tips of the branches are the ones you are comparing.
Ignoring the outgroup. The outgroup is not just filler. It defines what counts as ancestral versus derived. If you do not correctly identify and place the outgroup, your whole tree can be wrong.
Assuming morphological similarity means close relationship. This is the convergent evolution trap. Dolphins and sharks look similar, but molecular data shows they are not closely related. Always check molecular evidence before concluding that similar-looking organisms are closely related.
Treating the tree as fact. Phylogenetic trees and cladograms are hypotheses (EK 7.9.B.3). They represent the best interpretation of current evidence. New data can and does change them. The AP exam may ask you to explain how new evidence would affect an existing tree.
Confusing percent identity with percent difference. If two sequences are 95% identical, they are 5% different. Make sure you know which number you are working with when making comparisons or calculations.
Quick Review Checklist
- BLAST compares DNA or protein sequences to infer evolutionary relatedness based on percent identity between species
- Higher percent identity between two species = more similar sequences = more recently shared common ancestor
- Cladograms show branching pattern only; phylogenetic trees also show time scale or amount of evolutionary change
- Nodes on a tree represent the most recent common ancestor of the lineages branching from that point
- The outgroup is the least closely related lineage and is used as a reference to identify ancestral vs. derived traits
- Molecular data (like DNA sequence comparisons) is generally more reliable than morphological data because convergent evolution can make unrelated species look similar
- The molecular clock uses mutation rates to estimate divergence times, often calibrated with fossil evidence
- Phylogenetic trees and cladograms are hypotheses that can be revised as new molecular, morphological, or fossil evidence becomes available