koho.cpp
1.1.0
|
Build a binary decision tree in depth-first order. More...
#include <decision_tree.h>
Public Member Functions | |
DepthFirstTreeBuilder (OutputsIdx_t n_outputs, ClassesIdx_t *n_classes, ClassesIdx_t n_classes_max, FeaturesIdx_t n_features, SamplesIdx_t n_samples, ClassWeights_t *class_weight, TreeDepthIdx_t max_depth, FeaturesIdx_t max_features, unsigned long max_thresholds, std::string missing_values, RandomState const &random_state) | |
Create and initialize a new depth first tree builder. More... | |
void | build (Tree &tree, Features_t *X, Classes_t *y, SamplesIdx_t n_samples) |
Build a binary decision tree from the training data. More... | |
Protected Attributes | |
TreeDepthIdx_t | max_depth |
std::string | missing_values |
BestSplitter | splitter |
Build a binary decision tree in depth-first order.
koho::DepthFirstTreeBuilder::DepthFirstTreeBuilder | ( | OutputsIdx_t | n_outputs, |
ClassesIdx_t * | n_classes, | ||
ClassesIdx_t | n_classes_max, | ||
FeaturesIdx_t | n_features, | ||
SamplesIdx_t | n_samples, | ||
ClassWeights_t * | class_weight, | ||
TreeDepthIdx_t | max_depth, | ||
FeaturesIdx_t | max_features, | ||
unsigned long | max_thresholds, | ||
std::string | missing_values, | ||
RandomState const & | random_state | ||
) |
Create and initialize a new depth first tree builder.
[in] | n_outputs | Number of outputs (multi-output), minimum 1. |
[in] | n_classes | Number of classes in the training data for each output, minimum 2 [n_outputs]. |
[in] | n_classes_max | Maximum number of classes across all outputs. |
[in] | n_features | Number of features in the training data, minimum 1. |
[in] | n_samples | Number of samples in the training data, minimum 2. |
[in] | class_weight | Class weights for each output separately. Weights for each class, which should be inversely proportional to the class frequencies in the training data for class balancing, or 1.0 otherwise [n_outputs x max(n_classes for each output)]. |
[in] | max_depth | The depth of the tree is expanded until the specified maximum depth of the tree is reached or all leaves are pure or no further impurity improvement can be achieved. |
[in] | max_features | Number of random features to consider when looking for the best split at each node, between 1 and n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found up to the point that all features have been considered, even if it requires to effectively inspect more than max_features features. |
[in] | max_thresholds | Number of random thresholds to consider when looking for the best split at each node, 0 or 1. If 0, then all thresholds, based on the mid-point of the node samples, are considered. If 1, then consider 1 random threshold. |
[in] | missing_values | Handling of missing values. string "NMAR" or "None", (default="None") If "NMAR" (Not Missing At Random), then during training: the split criterion considers missing values as another category and samples with missing values are passed to either the left or the right child depending on which option provides the best split, and then during testing: if the split criterion includes missing values, a missing value is dealt with accordingly (passed to left or right child), or if the split criterion does not include missing values, a missing value at a split criterion is dealt with by combining the results from both children proportionally to the number of samples that are passed to the children during training. If "None", an error is raised if one of the features has a missing value. An option is to use imputation (fill-in) of missing values prior to using the decision tree classifier. |
[in] | random_state | Initialized Random Number Generator. |
"Decision Tree": max_features=n_features, max_thresholds=0.
The following configurations should only be used for "decision forests":
"Random Tree": max_features<n_features, max_thresholds=0.
"Extreme Randomized Trees (ET)": max_features=n_features, max_thresholds=1.
"Totally Randomized Trees": max_features=1, max_thresholds=1, very similar to "Perfect Random Trees (PERT)".
void koho::DepthFirstTreeBuilder::build | ( | Tree & | tree, |
Features_t * | X, | ||
Classes_t * | y, | ||
SamplesIdx_t | n_samples | ||
) |
Build a binary decision tree from the training data.
[in,out] | tree | A binary decision tree. |
[in] | X | Training input samples [n_samples x n_features]. |
[in] | y | Target class labels corresponding to the training input samples [n_samples]. |
[in] | n_samples | Number of samples, minimum 2. |
|
protected |
|
protected |
|
protected |