Fiveable

🤟🏼Natural Language Processing Unit 2 Review

QR code for Natural Language Processing practice questions

2.1 N-grams and language models

2.1 N-grams and language models

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🤟🏼Natural Language Processing
Unit & Topic Study Guides

N-grams are the building blocks of language models, helping predict word sequences. They capture local context by analyzing contiguous word or character groups, with higher-order n-grams providing more context but requiring more data.

Language models use n-grams to estimate word probabilities based on previous words. This approach simplifies modeling by assuming a word's likelihood depends only on its recent history, making it computationally feasible for various NLP tasks.

N-grams for Language Modeling

Understanding N-grams

  • N-grams represent contiguous sequences of n items (words or characters) extracted from a text corpus
    • n is a positive integer that determines the length of the sequence
    • Unigrams (1-grams) represent single items (words)
    • Bigrams (2-grams) represent pairs of items (phrase)
    • Trigrams (3-grams) represent triples of items (sentence fragment)
  • N-grams capture the local context and statistical dependencies between words or characters in a language
    • They provide information about the likelihood of words or characters appearing together
    • Higher-order n-grams (larger n) capture more context but require more training data

Role of N-grams in Language Modeling

  • Language models assign probabilities to sequences of words or characters
    • They enable the prediction of the likelihood of a given sequence occurring in a language
    • Language models are used in various NLP tasks (machine translation, speech recognition, text generation)
  • N-gram language models estimate the probability of a word or character given the previous n-1 words or characters
    • The estimation is based on the frequency counts of n-grams in the training corpus
    • The Markov assumption simplifies language modeling by assuming that the probability of a word depends only on the previous n-1 words, rather than the entire history
    • This assumption reduces the complexity of the model and makes it computationally tractable

Training N-gram Models

Counting and Probability Estimation

  • Training an n-gram language model involves counting the frequencies of n-grams in a text corpus
    • The counts are normalized to obtain probability estimates
    • The maximum likelihood estimation (MLE) is a simple method for estimating n-gram probabilities
      • MLE divides the count of an n-gram by the count of its prefix (n-1)-gram
      • Example: P(wnwn1)=count(wn1,wn)count(wn1)P(w_n | w_{n-1}) = \frac{count(w_{n-1}, w_n)}{count(w_{n-1})}
  • Smoothing techniques address the sparsity problem in n-gram models
    • Some n-grams may have zero or low counts in the training data, leading to unreliable probability estimates
    • Add-one smoothing (Laplace smoothing) adds a small constant (usually 1) to the count of each n-gram to avoid zero probabilities
    • Good-Turing smoothing adjusts the counts of low-frequency n-grams based on the count of n-grams with one higher frequency
    • Kneser-Ney smoothing considers the contexts in which n-grams occur and assigns higher probabilities to n-grams that appear in diverse contexts
Understanding N-grams, Natural Language Processing

Handling Unseen N-grams

  • Backoff and interpolation methods combine probabilities from lower-order n-gram models when higher-order n-grams are not available or have low counts
    • Backoff methods recursively fall back to lower-order models until a non-zero probability is found
    • Interpolation methods linearly combine the probabilities from different order models
  • Smoothing techniques also help in handling unseen n-grams by redistributing probability mass from seen n-grams to unseen ones
    • This allows the model to assign non-zero probabilities to unseen sequences
    • Smoothing helps in generalization and prevents the model from overfitting to the training data

Applying N-gram Models

Text Generation

  • N-gram language models can be used to generate text by iteratively predicting the next word or character based on the previous n-1 words or characters
    • The generation process starts with a seed or prompt and continues by sampling words or characters from the conditional probability distribution given by the n-gram model
    • The generated text follows the statistical patterns learned from the training corpus
    • The quality of the generated text depends on the order of the n-gram model and the diversity of the training data

Sequence Probability Estimation

  • The probability of a sequence of words or characters can be estimated by multiplying the conditional probabilities of each word or character given its preceding context
    • The chain rule of probability allows the joint probability of a sequence to be decomposed into a product of conditional probabilities
    • Example: P(w1,w2,...,wn)=P(w1)×P(w2w1)×...×P(wnwn1,...,w1)P(w_1, w_2, ..., w_n) = P(w_1) \times P(w_2 | w_1) \times ... \times P(w_n | w_{n-1}, ..., w_1)
  • N-gram models can be used to estimate the likelihood of a given sentence or to compare the probabilities of different sentences
    • This is useful in tasks like language identification, spell checking, and speech recognition
    • The probability estimates can also be used as features in other NLP models
Understanding N-grams, N-Gram Model

Evaluating N-gram Model Performance

Perplexity

  • Perplexity is a commonly used metric to evaluate the quality of a language model in terms of its ability to predict sequences
    • Perplexity measures how surprised or confused a language model is by a given test set, with lower perplexity indicating better performance
    • It is calculated as the exponential of the average negative log-likelihood of a test set under the language model
  • To compute perplexity, the test set is divided into n-grams, and the log-probability of each n-gram is calculated using the trained language model
    • The average negative log-likelihood is then computed by summing the log-probabilities and dividing by the total number of words or characters in the test set
    • Perplexity provides an intuitive measure of the language model's uncertainty, with lower perplexity indicating better predictive power

Other Evaluation Metrics

  • Cross-entropy measures the average number of bits needed to encode each word or character using the model's probability distribution
    • Lower cross-entropy indicates better performance and a closer match between the model's predictions and the actual data
  • The quality of generated text can be assessed through human evaluation
    • Metrics like coherence, fluency, and diversity of the generated samples can be rated by human annotators
    • Human evaluation provides a subjective assessment of the model's performance in generating meaningful and coherent text

Limitations of N-gram Models

  • N-gram language models have limitations in capturing long-range dependencies
    • They rely on the Markov assumption and consider only the local context of n-1 words or characters
    • Long-range dependencies and global coherence are not effectively captured by n-gram models
  • The number of parameters in an n-gram model grows exponentially with increasing n
    • Higher-order models require a large amount of training data to estimate the probabilities reliably
    • The sparsity problem becomes more severe for higher-order models, as many n-grams may not be observed in the training data
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →