random_output_trees.ensemble.LazyBaggingClassifier¶

class random_output_trees.ensemble.LazyBaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, random_state=None, verbose=0)¶

A lazy bagging classifier.

Everything is done lazily, models are built at prediction time and are not kept in memory. Since the models is thrown away, this allows to highly reduce the memory consumption and allows to build very large ensemble.

Parameters:

Parameters:	base_estimator : object or None, optional (default=None) The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree. n_estimators : int, optional (default=10) The number of base estimators in the ensemble. max_samples : int or float, optional (default=1.0) The number of samples to draw from X to train each base estimator. If int, then draw max_samples samples. If float, then draw max_samples * X.shape[0] samples. max_features : int or float, optional (default=1.0) The number of features to draw from X to train each base estimator. If int, then draw max_features features. If float, then draw max_features * X.shape[1] features. bootstrap : boolean, optional (default=True) Whether samples are drawn with replacement. bootstrap_features : boolean, optional (default=False) Whether features are drawn with replacement. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. verbose : int, optional (default=0) Controls the verbosity of the building process.

base_estimator : object or None, optional (default=None)

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.

n_estimators : int, optional (default=10)

The number of base estimators in the ensemble.

max_samples : int or float, optional (default=1.0)

The number of samples to draw from X to train each base estimator.

If int, then draw max_samples samples.

If float, then draw max_samples * X.shape[0] samples.

max_features : int or float, optional (default=1.0)

The number of features to draw from X to train each base estimator.

If int, then draw max_features features.

If float, then draw max_features * X.shape[1] features.

bootstrap : boolean, optional (default=True)

Whether samples are drawn with replacement.

bootstrap_features : boolean, optional (default=False)

Whether features are drawn with replacement.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0)

Controls the verbosity of the building process.

References

[R5]	L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

[R6]	L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

[R7]	T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

[R8]	G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Attributes

classes_	array of shape = [n_classes]	The classes labels.
n_classes_	int or list	The number of classes.
n_features_	int,	Number of features of the fitted input matrix
n_outputs_	int,	Number of outputs of the fitted ouput matrix
random_seed_	int,	Seed of the number generator

Methods

`decision_function`(X)	Average of the decision functions of the base classifiers.
`fit`(X, y[, sample_weight])	Build a lazy a bagging ensemble of estimators from the training set
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict class for X.
`predict_log_proba`(X)	Predict class log-probabilities for X.
`predict_proba`(X)	Predict class probabilities for X.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.

__init__(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, random_state=None, verbose=0)¶

decision_function(X)¶

Average of the decision functions of the base classifiers.

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns:	score : array, shape = [n_samples, k] or list of array The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute `classes_`. Regression and binary classification are special cases with `k == 1`, otherwise `k==n_classes`.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:

score : array, shape = [n_samples, k] or list of array

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(X, y, sample_weight=None)¶

Build a lazy a bagging ensemble of estimators from the training set

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator. y : array-like, shape = [n_samples] The target values (class labels in classification, real numbers in regression). sample_weight : array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
Returns:	self : object Returns self.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

y : array-like, shape = [n_samples]

The target values (class labels in classification, real numbers in regression).

sample_weight : array-like, shape = [n_samples] or None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self : object

Returns self.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

Parameters:	deep: boolean, optional : If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X)¶

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns:	y : array of shape = [n_samples, n_outputs] The predicted classes.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:

y : array of shape = [n_samples, n_outputs]

The predicted classes.

predict_log_proba(X)¶

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the trees in the forest.

Parameters:

Parameters:	X : array-like of shape = [n_samples, n_features] The input samples.
Returns:	p : array of shape = [n_samples, n_classes], or a list of n_outputs such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

X : array-like of shape = [n_samples, n_features]

The input samples.

Returns:

p : array of shape = [n_samples, n_classes], or a list of n_outputs

such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)¶

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of a an input sample represents the proportion of estimators predicting each class.

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns:	p : array of shape = [n_samples, n_classes] or list of array The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:

p : array of shape = [n_samples, n_classes] or list of array

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Parameters:	X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self :