Tree-based methods are powerful machine learning algorithms used for classification and regression. They use decision tree structures to make predictions by splitting data based on feature values, handling both categorical and numerical data without extensive preprocessing. These methods include decision trees, random forests, and gradient boosting. They're interpretable, capture complex relationships, and handle missing data well. Tree-based methods are great for mixed data types and when interpretability matters, but may overfit if not properly tuned.
</>Pythonfrom sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(max_depth=5, min_samples_leaf=10) clf.fit(X_train, y_train)
</>Pythony_pred = clf.predict(X_test)
</>Pythonfrom sklearn.tree import plot_tree plot_tree(clf, filled=True, feature_names=feature_names, class_names=class_names)
</>Pythonfrom sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=100, max_depth=5) rf.fit(X_train, y_train)
</>Pythonimport xgboost as xgb params = {'objective': 'binary:logistic', 'learning_rate': 0.1, 'max_depth': 3} dtrain = xgb.DMatrix(X_train, label=y_train) model = xgb.train(params, dtrain, num_boost_round=100)
</>Pythonfrom sklearn.model_selection import GridSearchCV param_grid = {'max_depth': [3, 5, 7], 'min_samples_leaf': [5, 10, 20]} grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5) grid_search.fit(X_train, y_train) best_params = grid_search.best_params_