Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Transfer learning is one of the most powerful concepts you'll encounter in modern image analysis, and it shows up repeatedly on exams because it addresses a fundamental challenge: how do we build effective models when we don't have millions of labeled images? The strategies covered here demonstrate core principles of knowledge reuse, domain generalization, model efficiency, and adaptive learning—all testable concepts that connect to broader themes about how neural networks learn representations and how those representations can be leveraged across tasks.
You're being tested on your understanding of when and why to apply different transfer strategies, not just what they are. An FRQ might ask you to recommend an approach given specific constraints (limited data, new classes, computational limits), so don't just memorize definitions—know what problem each strategy solves and how it compares to alternatives. The underlying principle is always the same: learned features are valuable, and smart reuse beats training from scratch.
These strategies focus on taking an existing model trained on large datasets and modifying it to work on your specific task. The core mechanism involves selectively updating network weights while preserving useful learned representations.
Compare: Fine-tuning vs. Feature extraction—both reuse pre-trained models, but fine-tuning updates weights while feature extraction keeps them frozen. If an FRQ gives you very limited data and computational resources, feature extraction is often the safer choice; fine-tuning risks overfitting without enough examples.
When your training data comes from a different distribution than your target application, these strategies help models generalize across that gap. The mechanism involves learning representations that are invariant to domain-specific characteristics.
Compare: Domain adaptation vs. Adversarial transfer learning—both address distribution differences, but domain adaptation focuses on natural domain shift (lab vs. field images) while adversarial transfer specifically targets robustness against malicious perturbations. Know which problem you're solving.
These strategies tackle the extreme data scarcity problem—what if you only have a handful of examples, or none at all, for certain classes? The mechanism relies on learning transferable meta-knowledge or leveraging semantic relationships between classes.
Compare: Few-shot vs. Zero-shot learning—few-shot requires at least some examples of new classes; zero-shot requires none but depends on semantic information linking new classes to known ones. FRQs may ask which to use given specific data availability constraints.
Rather than training separate models for each task, these strategies share knowledge across related problems. The mechanism exploits the fact that related tasks often benefit from similar underlying representations.
Compare: Multi-task learning vs. Progressive neural networks—multi-task trains everything together (requires all tasks upfront), while progressive networks add tasks sequentially (supports continual learning). Choose based on whether tasks arrive simultaneously or over time.
These strategies focus on maintaining performance while reducing computational costs—essential for deployment on resource-constrained devices. The mechanism involves compressing knowledge from large models into smaller, faster ones.
Compare: Knowledge distillation vs. Feature extraction—both leverage pre-trained models, but distillation creates a new compact model while feature extraction uses the original model as a fixed component. Distillation is better when you need a standalone efficient model.
| Concept | Best Examples |
|---|---|
| Reusing pre-trained weights | Fine-tuning, Feature extraction, Layer freezing |
| Handling domain shift | Domain adaptation, Adversarial transfer learning |
| Extreme data scarcity | Few-shot learning, Zero-shot learning |
| Multi-task knowledge sharing | Multi-task learning, Progressive neural networks |
| Model compression | Knowledge distillation |
| Preventing catastrophic forgetting | Progressive neural networks, Layer freezing |
| Deployment efficiency | Knowledge distillation, Feature extraction |
Which two strategies both address the problem of learning new classes with minimal data, and what key resource does each require?
You have a pre-trained ImageNet model and only 500 labeled medical images. Compare the tradeoffs between fine-tuning and feature extraction for this scenario.
A model trained on daytime traffic images performs poorly on nighttime images. Which transfer learning strategy directly addresses this problem, and what is its core mechanism?
Explain why progressive neural networks prevent catastrophic forgetting while standard fine-tuning does not. What architectural difference makes this possible?
FRQ-style: You need to deploy an image classifier on mobile devices with limited memory, but your best-performing model is too large. Describe a transfer learning strategy that could help, and explain how it preserves performance while reducing model size.