Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Data mining sits at the heart of cognitive computing in business—it's how organizations transform raw data into actionable intelligence. When you're tested on these techniques, you're really being evaluated on your understanding of when to apply which method, what business problems each solves, and how algorithms learn patterns from data. The exam will expect you to distinguish between supervised and unsupervised approaches, recognize appropriate use cases, and understand the tradeoffs between interpretability and predictive power.
Don't fall into the trap of memorizing algorithm names without context. Instead, focus on the underlying logic: classification assigns labels, clustering finds natural groupings, regression predicts continuous values, and association rules uncover hidden relationships. Each technique answers a different business question, and knowing which question each answers is what separates strong exam performance from mediocre recall.
These techniques require labeled training data—you're teaching the algorithm what "correct" looks like so it can make predictions on new data. The model learns a mapping function from inputs to known outputs.
Compare: Decision Trees vs. Support Vector Machines—both handle classification, but Decision Trees prioritize interpretability while SVMs prioritize accuracy in complex, high-dimensional spaces. If an FRQ asks about explaining a model to executives, go with Decision Trees; for maximum predictive power with messy data, SVMs are your answer.
Compare: Naive Bayes vs. KNN—both are simple to implement, but Naive Bayes builds a probabilistic model while KNN stores all training data. Naive Bayes handles high-dimensional text data efficiently; KNN struggles with the "curse of dimensionality" but captures local patterns better.
These techniques work with unlabeled data—the algorithm identifies patterns without being told what to look for. The goal is to uncover natural groupings or relationships that humans might miss.
Compare: Clustering vs. Association Rule Mining—both are unsupervised, but clustering groups entities (customers, products) while association rules find relationships between items (purchase patterns). Clustering answers "who are my customer segments?" while association rules answer "what do they buy together?"
When your target variable is a number rather than a category, you need regression techniques. These models estimate the mathematical relationship between predictors and outcomes.
Compare: Regression Analysis vs. Neural Networks—both predict continuous values, but regression offers interpretable coefficients (you can explain why the prediction changed) while neural networks are "black boxes" that often achieve higher accuracy on complex tasks. Choose regression when explainability matters; neural networks when prediction accuracy is paramount.
Some business problems require identifying exceptions rather than rules. These techniques flag observations that deviate significantly from expected patterns.
Compare: Anomaly Detection vs. Classification—both can identify fraud, but classification requires labeled examples of fraud to train on, while anomaly detection can flag unusual patterns without prior fraud examples. Use classification when you have good historical data; anomaly detection when fraud patterns constantly evolve.
| Concept | Best Examples |
|---|---|
| Supervised Classification | Decision Trees, Support Vector Machines, Naive Bayes, KNN |
| Unsupervised Grouping | Clustering (K-Means, DBSCAN), Association Rule Mining |
| Continuous Prediction | Regression Analysis, Neural Networks |
| High Interpretability | Decision Trees, Linear Regression, Naive Bayes |
| Complex Pattern Recognition | Neural Networks, Support Vector Machines |
| Text/Document Analysis | Naive Bayes, Support Vector Machines |
| Fraud/Outlier Detection | Anomaly Detection, Clustering-based methods |
| Real-time/Fast Prediction | Naive Bayes, Decision Trees |
Which two techniques both handle classification but differ most in their interpretability—and when would you choose each in a business context?
A retailer wants to understand which products are frequently purchased together. Which technique should they use, and what three metrics would they use to evaluate the discovered patterns?
Compare and contrast clustering and classification: What fundamental difference in the training data determines which approach is appropriate?
Your company has transaction data but very few confirmed fraud cases to learn from. Would you recommend a classification approach or anomaly detection—and why?
An FRQ asks you to recommend a data mining approach for predicting next quarter's sales revenue. Which technique category applies, and what's one key assumption of the simplest model in that category?