🐾General Biology II

Key Concepts in Phylogenetic Tree Construction

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Phylogenetic trees are the roadmaps of evolutionary biology—they show you how life diversified from common ancestors into the millions of species we see today. On the AP exam, you're being tested on your ability to read, interpret, and construct these trees, which means understanding not just what they show, but how scientists build them and why certain methods produce more reliable results than others.

The concepts here connect directly to core principles you'll see throughout the course: common ancestry, natural selection, genetic variation, and molecular evidence for evolution. When you encounter phylogenetic trees on the exam, don't just memorize branch patterns—know what each node represents, why branch lengths matter, and how scientists distinguish true evolutionary relationships from misleading similarities. Master these concepts, and you'll be ready for any tree-based FRQ or multiple-choice question they throw at you.

Tree Structure and Terminology

Before you can analyze evolutionary relationships, you need to understand the basic architecture of phylogenetic trees. Every element—nodes, branches, tips—carries specific evolutionary meaning.

Definition of Phylogenetic Trees

Graphical representations of evolutionary relationships—each branching point shows where lineages diverged from a common ancestor
Tips (terminal nodes) represent current species or groups, while internal nodes represent hypothetical ancestors
Essential for visualizing descent with modification, the core mechanism underlying all of evolutionary biology

Rooted vs. Unrooted Trees

Rooted trees show directionality—they indicate which ancestor came first and how evolution proceeded forward in time
Unrooted trees display relationships without specifying the common ancestor, showing only how closely related groups are to each other
Rooted trees require an outgroup to establish the base, making them more informative but harder to construct accurately

Cladograms vs. Phylograms

Cladograms focus purely on branching order—branch lengths are meaningless and only show who is related to whom
Phylograms encode additional data through branch lengths, representing either amount of evolutionary change or time since divergence
Choose based on your question: cladograms for relationships, phylograms for understanding rates of change or divergence timing

Compare: Rooted trees vs. cladograms—both show branching patterns, but rooted trees specify direction while cladograms may or may not. If an FRQ asks you to "trace the evolutionary history," you need a rooted tree; if it asks "which species are most closely related," either works.

Grouping Species Correctly

One of the biggest challenges in phylogenetics is ensuring your groups actually reflect evolutionary history. A valid phylogenetic group must be defined by shared ancestry, not superficial similarity.

Monophyletic, Paraphyletic, and Polyphyletic Groups

Monophyletic groups (clades) include an ancestor and all its descendants—these are the only truly valid evolutionary groups
Paraphyletic groups exclude some descendants (like "reptiles" excluding birds), making them taxonomically convenient but evolutionarily incomplete
Polyphyletic groups lump unrelated organisms by convergent traits—these are errors that misrepresent evolutionary history

Outgroups and Their Importance

An outgroup is a reference species that diverged before the group you're studying, providing a baseline for comparison
Critical for rooting trees—without an outgroup, you can't determine which traits are ancestral versus derived
Choose outgroups carefully: they must be related enough to compare meaningfully but divergent enough to clearly fall outside your study group

Character States: Ancestral vs. Derived

Ancestral (plesiomorphic) traits were present in the common ancestor and don't tell you about relationships within the group
Derived (apomorphic) traits evolved after divergence and are the key to identifying shared derived characters (synapomorphies)
Only synapomorphies define clades—shared ancestral traits can mislead you into grouping unrelated lineages

Compare: Monophyletic vs. paraphyletic groups—both include a common ancestor, but monophyletic groups include all descendants while paraphyletic groups arbitrarily exclude some. Classic exam example: "Reptilia" is paraphyletic because it excludes birds, which evolved from reptilian ancestors.

Distinguishing True Relationships from False Signals

Not all similarities indicate common ancestry. Convergent evolution can produce strikingly similar traits in unrelated species, creating traps for phylogenetic analysis.

Homology vs. Homoplasy

Homologous traits share common ancestry—the forelimbs of bats, whales, and humans are homologous despite different functions
Homoplastic traits evolved independently through convergent evolution, parallel evolution, or reversal—wings in bats and birds are homoplastic
Distinguishing these is critical: homology supports phylogenetic grouping, while homoplasy creates false signals that must be identified and excluded

Reading and Interpreting Phylogenetic Trees

Nodes represent speciation events—the point where one ancestral population split into two distinct lineages
Sister groups share an immediate common ancestor and are each other's closest relatives, regardless of how different they look
Tree rotation doesn't change relationships—branches can spin around nodes without altering evolutionary meaning, so focus on connections, not left-right position

Compare: Homology vs. homoplasy—both produce similar structures, but homology reflects shared ancestry while homoplasy reflects similar selective pressures. On FRQs about convergent evolution, homoplasy is your go-to example; for questions about evidence for common descent, use homology.

Methods for Building Trees

Scientists use different analytical approaches depending on their data and goals. Each method makes different assumptions and has distinct strengths and limitations.

Parsimony Principle

Favors the simplest explanation—the tree requiring the fewest evolutionary changes is considered most likely correct
Based on Occam's Razor: don't assume more evolutionary events than necessary to explain the data
Limitations: can be misled when evolution is not parsimonious—rapid evolution or convergence can make the simplest tree wrong

Maximum Likelihood Method

Statistical approach that calculates the probability of observing your data given each possible tree
Selects the tree with highest probability—more computationally intensive than parsimony but handles complex evolutionary scenarios better
Requires a model of evolution specifying how characters change over time, making results dependent on model accuracy

Bayesian Inference

Incorporates prior knowledge about evolutionary processes and updates probabilities as data is analyzed
Produces posterior probabilities for each tree, giving you a measure of confidence rather than a single "best" tree
Handles uncertainty explicitly—particularly valuable when data is limited or conflicting signals exist

Compare: Parsimony vs. maximum likelihood—parsimony minimizes total changes while maximum likelihood maximizes statistical probability. Parsimony is faster and intuitive; maximum likelihood is more rigorous for molecular data. If an FRQ asks about choosing methods, mention that molecular datasets typically favor likelihood approaches.

Assessing Confidence and Timing

Building a tree is only half the battle—you also need to know how much to trust it and what it tells you about when events occurred.

Bootstrap Analysis

Resampling technique that tests tree reliability by randomly sampling your data with replacement hundreds or thousands of times
Bootstrap values indicate what percentage of resampled datasets support each branch—values above 70% are generally considered strong support
Does not prove a tree is correct—only shows whether your data consistently supports the same relationships

Molecular Clock Hypothesis

Assumes mutations accumulate at constant rates—if true, genetic differences between species reflect time since divergence
Allows dating of evolutionary events by calibrating mutation rates against known fossil dates
Major limitation: mutation rates vary across lineages and genes, so relaxed molecular clocks that allow rate variation are now preferred

Interpreting Branch Lengths

In phylograms, length = evolutionary change—longer branches indicate more mutations or more time, depending on the analysis
Short branches suggest rapid diversification or recent divergence; long branches may indicate long-branch attraction artifacts
Always check what branch lengths represent in a given tree—time and genetic change are not the same thing

Compare: Bootstrap values vs. Bayesian posterior probabilities—both measure confidence, but bootstrap values come from resampling while posteriors come from probability calculations. Bayesian posteriors are often higher for the same data, so don't directly compare numbers across methods.

Constructing Trees from Data

Understanding how raw data becomes a phylogenetic tree helps you critically evaluate the trees you encounter on exams and in research.

Constructing Phylogenetic Trees from Character Matrices

Character matrices organize trait data in rows (species) and columns (characters), with each cell containing a character state
Molecular data (DNA, protein sequences) now dominates phylogenetics because it provides abundant, objective characters
Character selection matters enormously—choosing uninformative or homoplastic characters produces unreliable trees regardless of method

Quick Reference Table

Concept	Best Examples
Tree types	Rooted vs. unrooted, cladograms vs. phylograms
Valid groupings	Monophyletic groups (clades), synapomorphies
Invalid groupings	Paraphyletic groups, polyphyletic groups, homoplasy
Tree-building methods	Parsimony, maximum likelihood, Bayesian inference
Confidence assessment	Bootstrap analysis, posterior probabilities
Temporal analysis	Molecular clock, branch length interpretation
Character analysis	Ancestral vs. derived states, homology vs. homoplasy
Data organization	Character matrices, outgroup selection

Self-Check Questions

What is the key difference between a monophyletic group and a paraphyletic group, and why does this distinction matter for accurate classification?
You're given a phylogenetic tree where two distantly related species both have wings. How would you determine whether this similarity represents homology or homoplasy, and what would each conclusion imply?
Compare parsimony and maximum likelihood as tree-building methods—under what circumstances might they produce different results, and which would you trust more for a large molecular dataset?
A branch on a phylogenetic tree has a bootstrap value of 95%. What does this tell you, and what does it not tell you about the evolutionary relationship shown?
If you were constructing a phylogenetic tree for a group of mammals and needed to select an outgroup, what criteria would guide your choice, and why is this step essential for interpreting the tree correctly?