koho.sklearn
.DecisionTreeClassifier¶
-
class
koho.sklearn.
DecisionTreeClassifier
(class_balance='balanced', max_depth=None, max_features=None, max_thresholds=None, random_state=None)[source]¶ A decision tree classifier.
- Parameters
- class_balancestring ‘balanced’ or None, optional (default=’balanced’)
Weighting of the classes.
If ‘balanced’, then the values of y are used to automatically adjust class weights inversely proportional to class frequencies in the input data.
If None, all classes are supposed to have weight one.
- max_depthinteger or None, optional (default=None)
The maximum depth of the tree.
The depth of the tree is expanded until the specified maximum depth of the tree is reached or all leaves are pure or no further impurity improvement can be achieved. - If None, the maximum depth of the tree is set to max long (2^31-1).
- max_featuresint, float, string or None, optional (default=None)
Note: only to be used by Decision Forest
The number of random features to consider when looking for the best split at each node.
If int, then consider max_features features.
If float, then max_features is a percentage and int(max_features * n_features) features are considered.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features considering all features in random order.
Note: the search for a split does not stop until at least one valid partition of the node samples is found up to the point that all features have been considered, even if it requires to effectively inspect more than
max_features
features.Decision Tree:
max_features=None
andmax_thresholds=None
Random Tree:
max_features<n_features
andmax_thresholds=None
- max_thresholds1 or None, optional (default=None)
Note: only to be used by Decision Forest
The number of random thresholds to consider when looking for the best split at each node.
If 1, then consider 1 random threshold, based on the Extreme Randomized Tree formulation.
If None, then all thresholds, based on the mid-point of the node samples, are considered.
Extreme Randomized Trees (ET):
max_thresholds=1
Totally Randomized Trees:
max_features=1
andmax_thresholds=1
, very similar to Perfect Random Trees (PERT).- random_stateint or None, optional (default=None)
Note: only to be used by Decision Forest
A random state to control the pseudo number generation and repetitiveness of fit().
If int, random_state is the seed used by the random number generator;
If None, the random number generator is seeded with the current system time.
- Attributes
- classes_array, shape = [n_classes]
The classes labels.
- n_classes_int
The number of classes.
- n_features_int
The number of features.
- max_features_int,
The inferred value of max_features.
- tree_tree object
The underlying estimator.
feature_importances_
array, shape = [n_features]Get feature importances from the decision tree.
-
__init__
(self, class_balance='balanced', max_depth=None, max_features=None, max_thresholds=None, random_state=None)[source]¶ Create a new decision tree classifier and initialize it with hyperparameters.
-
export_graphviz
(self, feature_names=None, class_names=None, rotate=False)[source]¶ Export of a decision tree in GraphViz dot format.
- Parameters
- feature_nameslist of strings, optional (default=None)
Names of each of the features.
- class_nameslist of strings, optional (default=None)
Names of each of the classes in ascending numerical order. Classes are represented as integers: 0, 1, … (n_classes-1). If y consists of class labels, those class labels need to be provided as class_names again.
- rotatebool, optional (default=False)
When set to
True
, orient tree left to right rather than top-down.
- Returns
- dot_datastring
String representation of the decision tree classifier in GraphViz dot format.
-
export_text
(self)[source]¶ Export of a decision tree in a simple text format.
- Returns
- datastring
String representation of the decision tree classifier in a simple text format.
-
feature_importances_
¶ Get feature importances from the decision tree.
-
fit
(self, X, y)[source]¶ Build a decision tree classifier from the training data.
- Parameters
- Xarray, shape = [n_samples, n_features]
The training input samples.
- yarray, shape = [n_samples]
The target class labels corresponding to the training input samples.
- Returns
- selfobject
Returns self.
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
- Parameters
- deepboolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
predict
(self, X)[source]¶ Predict classes for the test data.
- Parameters
- Xarray, shape = [n_samples, n_features]
The test input samples.
- Returns
- yarray, shape = [n_samples]
The predicted classes for the test input samples.
-
predict_proba
(self, X)[source]¶ Predict classes probabilities for the test data.
- Parameters
- Xarray, shape = [n_samples, n_features]
The test input samples.
- Returns
- parray, shape = [n_samples, n_classes]
The predicted classes probablities for the test input samples.
-
score
(self, X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- Xarray-like, shape = (n_samples, n_features)
Test samples.
- yarray-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like, shape = [n_samples], optional
Sample weights.
- Returns
- scorefloat
Mean accuracy of self.predict(X) wrt. y.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Returns
- self