Token prediction is the core mechanism of large language models: the model generates text by repeatedly predicting the most likely next word (token) based on patterns in its training data, not by looking up verified facts. In AP Seminar, it explains why AI output sounds confident but isn't credible evidence.
Token prediction is the fundamental mechanism behind large language models (LLMs) like ChatGPT, Claude, and Gemini. A token is a chunk of text, usually a word or part of a word. The model reads everything written so far and predicts the most statistically likely next token, then the next, then the next, building sentences one piece at a time. It learned those probabilities from massive amounts of training text. Think of it as extremely sophisticated autocomplete, not a search engine or a brain.
This matters for AP Seminar because it tells you what an LLM is actually doing when it writes. The model is not retrieving facts from a verified database or reasoning about truth. It is producing the text that looks most like a good answer. That is why LLMs can hallucinate, meaning they confidently generate fake statistics, fake quotes, and fake citations that pass the eye test. The output is fluent because fluency is exactly what token prediction optimizes for. Accuracy is a side effect, not a guarantee.
AP Seminar is built around evaluating sources and evidence, especially in Big Idea 1 (Question and Explore) and Big Idea 2 (Understand and Analyze). Token prediction is the concept that lets you apply those skills to AI. Once you know an LLM is predicting plausible text rather than verifying truth, you can explain why AI-generated content fails credibility checks like RAVEN. It has no author with expertise, no transparent sourcing, and no commitment to accuracy. This also connects directly to College Board's AI policy for the performance tasks. The IRR and IWA must reflect your own research and argument, and every claim needs a real, checkable source. Understanding token prediction is your defense against accidentally building an argument on fabricated evidence. It is also increasingly likely to show up as subject matter, since AI ethics and AI literacy are exactly the kind of cross-disciplinary issues Seminar stimulus materials and student research questions gravitate toward.
Large language model (LLM) (Big Idea 1)
Token prediction is the engine and the LLM is the car. An LLM is just a giant neural network trained to do token prediction at enormous scale, so every strength and weakness of ChatGPT traces back to this one mechanism.
Evidence (Big Idea 2)
AP Seminar defines strong evidence as relevant, credible, and verifiable. Token prediction explains why raw AI output fails that test. The model generates what is likely, not what is true, so anything it tells you must be traced back to a real source before it can support your argument.
Coherence (Big Idea 4)
LLM text is a useful warning that coherence is not the same as soundness. Token prediction produces smooth, well-connected prose even when the underlying claims are wrong, which is exactly why graders score your reasoning and sourcing, not just your flow.
Informed consent (Big Idea 1)
LLMs are trained on huge datasets of human writing, much of it scraped without the authors' knowledge. That raises the same research-ethics questions about consent and data use you encounter when evaluating studies, and it makes a strong lens for an IRR or IWA on AI ethics.
AP Seminar doesn't quiz you on definitions, so you won't see a multiple-choice question asking what token prediction is. Instead, the term earns its keep in three places. First, the End-of-Course Exam could hand you a stimulus source about AI, and knowing how LLMs actually work lets you analyze the author's argument and evaluate their evidence with real precision. Second, if your team's IRR or your IWA touches on AI, technology, or ethics, token prediction gives you the mechanical explanation that turns a vague claim like 'AI is unreliable' into a specific, well-reasoned line of analysis. Third, and most practically, it shapes how you use AI tools during the performance tasks. College Board allows generative AI only as a supplementary aid, and your submitted work must be your own with sources you have actually verified. Understanding that the model predicts plausible text rather than retrieving facts is the reason you check every citation an AI hands you.
Googling retrieves existing documents written by real authors that you can evaluate for credibility. Token prediction generates brand-new text that has never existed anywhere, assembled word by word from statistical patterns. A search result has a source you can vet with RAVEN. An LLM's answer has no source at all unless you trace its claims yourself, which is why citing 'ChatGPT said so' fails as evidence in Seminar.
Token prediction means an LLM generates text by repeatedly guessing the most statistically likely next word, not by looking up or verifying facts.
Because the model optimizes for plausible-sounding text, it can hallucinate fake statistics, quotes, and citations that look completely real.
AI output is not credible evidence in AP Seminar; every claim an LLM gives you must be traced to a real, verifiable source before it goes in your IRR or IWA.
Fluent and coherent does not mean true, which is exactly why Seminar scores your sourcing and reasoning rather than how smooth your prose sounds.
College Board permits generative AI only as a supplementary tool on performance tasks, so understanding token prediction helps you use AI responsibly without letting it replace your own work.
Token prediction is the mechanism large language models use to generate text, predicting the most likely next word based on patterns from training data. In Seminar, it explains why AI-generated text sounds polished but doesn't count as credible evidence.
No. ChatGPT produces text through token prediction, choosing the statistically likely next word rather than reasoning about truth or meaning. The output can be fluent and confident while being factually wrong, which is why hallucinated facts and fake citations happen.
Google retrieves real documents written by real authors that you can evaluate with credibility checks like RAVEN. Token prediction generates new text that has no author and no source, so an LLM's answer can't be vetted the way a search result can.
Only as a supplementary aid. College Board's policy requires that performance tasks reflect your own research, analysis, and writing, and you complete in-class checkpoints to confirm authorship. Anything an AI tells you must be independently verified with real sources before it appears in your work.
Because token prediction generates what a citation typically looks like, not a record of a real publication. The model assembles plausible author names, titles, and journals from patterns in its training data, so fabricated references can look completely legitimate until you try to find them.
Connect this key term to the AP exam workflow: review the course, practice questions, and check related study tools.
Review units, study guides, and course resources.
Check this vocabulary in multiple-choice context.
Apply key concepts in written AP responses.
Estimate the exam score you are working toward.
Review the highest-yield facts before practice.