tree-based methods in machine learning
Tree-based methods are powerful machine learning algorithms used for classification and regression. They use decision tree structures to make predictions by splitting data based on feature values, handling both categorical and numerical data without extensive preprocessing. These methods include decision trees, random forests, and gradient boosting. They're interpretable, capture complex relationships, and handle missing data well. Tree-based methods are great for mixed data types and when interpretability matters, but may overfit if not properly tuned.
from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(max_depth=5, min_samples_leaf=10) clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
from sklearn.tree import plot_tree plot_tree(clf, filled=True, feature_names=feature_names, class_names=class_names)
from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=100, max_depth=5) rf.fit(X_train, y_train)
import xgboost as xgb params = {'objective': 'binary:logistic', 'learning_rate': 0.1, 'max_depth': 3} dtrain = xgb.DMatrix(X_train, label=y_train) model = xgb.train(params, dtrain, num_boost_round=100)
from sklearn.model_selection import GridSearchCV param_grid = {'max_depth': [3, 5, 7], 'min_samples_leaf': [5, 10, 20]} grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5) grid_search.fit(X_train, y_train) best_params = grid_search.best_params_