A decision tree classifier.
Parameters: | criterion : string, optional (default=”gini”)
splitter : string, optional (default=”best”)
max_features : int, float, string or None, optional (default=None)
max_depth : int or None, optional (default=None)
min_samples_split : int, optional (default=2)
min_samples_leaf : int, optional (default=1)
min_weight_fraction_leaf : float, optional (default=0.)
max_leaf_nodes : int or None, optional (default=None)
random_state : int, RandomState instance or None, optional (default=None)
output_transformer : scikit-learn transformer or None (default),
|
---|
See also
References
[R15] | http://en.wikipedia.org/wiki/Decision_tree_learning |
[R16] | L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984. |
[R17] | T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. |
[R18] | L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm |
Examples
>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...
...
array([ 1. , 0.93..., 0.86..., 0.93..., 0.93...,
0.93..., 0.93..., 1. , 0.93..., 1. ])
Attributes
feature_importances_ | Return the feature importances. |
tree_ | Tree object | The underlying Tree object. |
max_features_ | int, | The infered value of max_features. |
classes_ | array of shape = [n_classes] or a list of such arrays | The classes labels (single output problem), or a list of arrays of class labels (multi-output problem). |
n_classes_ | int or list | The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems). |
Methods
fit(X, y[, sample_weight, check_input]) | Build a decision tree from the training set (X, y). |
fit_transform(X[, y]) | Fit to data, then transform it. |
get_params([deep]) | Get parameters for this estimator. |
predict(X) | Predict class or regression value for X. |
predict_log_proba(X) | Predict class log-probabilities of the input samples X. |
predict_proba(X) | Predict class probabilities of the input samples X. |
score(X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params(**params) | Set the parameters of this estimator. |
transform(*args, **kwargs) | DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19. |
Return the feature importances.
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
Returns: | feature_importances_ : array, shape = [n_features] |
---|
Build a decision tree from the training set (X, y).
Parameters: | X : array-like, shape = [n_samples, n_features]
y : array-like, shape = [n_samples] or [n_samples, n_outputs]
sample_weight : array-like, shape = [n_samples] or None
check_input : boolean, (default=True)
|
---|---|
Returns: | self : object
|
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: | X : numpy array of shape [n_samples, n_features]
y : numpy array of shape [n_samples]
|
---|---|
Returns: | X_new : numpy array of shape [n_samples, n_features_new]
|
Get parameters for this estimator.
Parameters: | deep: boolean, optional :
|
---|---|
Returns: | params : mapping of string to any
|
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters: | X : array-like of shape = [n_samples, n_features]
|
---|---|
Returns: | y : array of shape = [n_samples] or [n_samples, n_outputs]
|
Predict class log-probabilities of the input samples X.
Parameters: | X : array-like of shape = [n_samples, n_features]
|
---|---|
Returns: | p : array of shape = [n_samples, n_classes], or a list of n_outputs
|
Predict class probabilities of the input samples X.
Parameters: | X : array-like of shape = [n_samples, n_features]
|
---|---|
Returns: | p : array of shape = [n_samples, n_classes], or a list of n_outputs
|
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: | X : array-like, shape = (n_samples, n_features)
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
sample_weight : array-like, shape = [n_samples], optional
|
---|---|
Returns: | score : float
|
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns: | self : |
---|
DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19. Use SelectFromModel instead.
Reduce X to its most important features.
Uses coef_ or feature_importances_ to determine the most important features. For models with a coef_ for each class, the absolute sum over the classes is used.
Parameters: | X : array or scipy sparse matrix of shape [n_samples, n_features]
|
---|---|
Returns: | X_r : array of shape [n_samples, n_selected_features]
|