koho.cpp  1.1.0
Public Member Functions | Protected Attributes | List of all members
koho::DepthFirstTreeBuilder Class Reference

Build a binary decision tree in depth-first order. More...

#include <decision_tree.h>

Collaboration diagram for koho::DepthFirstTreeBuilder:
Collaboration graph
[legend]

Public Member Functions

 DepthFirstTreeBuilder (OutputsIdx_t n_outputs, ClassesIdx_t *n_classes, ClassesIdx_t n_classes_max, FeaturesIdx_t n_features, SamplesIdx_t n_samples, ClassWeights_t *class_weight, TreeDepthIdx_t max_depth, FeaturesIdx_t max_features, unsigned long max_thresholds, std::string missing_values, RandomState const &random_state)
 Create and initialize a new depth first tree builder. More...
 
void build (Tree &tree, Features_t *X, Classes_t *y, SamplesIdx_t n_samples)
 Build a binary decision tree from the training data. More...
 

Protected Attributes

TreeDepthIdx_t max_depth
 
std::string missing_values
 
BestSplitter splitter
 

Detailed Description

Build a binary decision tree in depth-first order.

Constructor & Destructor Documentation

◆ DepthFirstTreeBuilder()

koho::DepthFirstTreeBuilder::DepthFirstTreeBuilder ( OutputsIdx_t  n_outputs,
ClassesIdx_t n_classes,
ClassesIdx_t  n_classes_max,
FeaturesIdx_t  n_features,
SamplesIdx_t  n_samples,
ClassWeights_t class_weight,
TreeDepthIdx_t  max_depth,
FeaturesIdx_t  max_features,
unsigned long  max_thresholds,
std::string  missing_values,
RandomState const &  random_state 
)

Create and initialize a new depth first tree builder.

Parameters
[in]n_outputsNumber of outputs (multi-output), minimum 1.
[in]n_classesNumber of classes in the training data for each output, minimum 2 [n_outputs].
[in]n_classes_maxMaximum number of classes across all outputs.
[in]n_featuresNumber of features in the training data, minimum 1.
[in]n_samplesNumber of samples in the training data, minimum 2.
[in]class_weightClass weights for each output separately. Weights for each class, which should be inversely proportional to the class frequencies in the training data for class balancing, or 1.0 otherwise [n_outputs x max(n_classes for each output)].
[in]max_depthThe depth of the tree is expanded until the specified maximum depth of the tree is reached or all leaves are pure or no further impurity improvement can be achieved.
[in]max_featuresNumber of random features to consider when looking for the best split at each node, between 1 and n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found up to the point that all features have been considered, even if it requires to effectively inspect more than max_features features.
[in]max_thresholdsNumber of random thresholds to consider when looking for the best split at each node, 0 or 1.
If 0, then all thresholds, based on the mid-point of the node samples, are considered.
If 1, then consider 1 random threshold.
[in]missing_valuesHandling of missing values.
string "NMAR" or "None", (default="None")
If "NMAR" (Not Missing At Random), then during training: the split criterion considers missing values as another category and samples with missing values are passed to either the left or the right child depending on which option provides the best split, and then during testing: if the split criterion includes missing values, a missing value is dealt with accordingly (passed to left or right child), or if the split criterion does not include missing values, a missing value at a split criterion is dealt with by combining the results from both children proportionally to the number of samples that are passed to the children during training.
If "None", an error is raised if one of the features has a missing value.
An option is to use imputation (fill-in) of missing values prior to using the decision tree classifier.
[in]random_stateInitialized Random Number Generator.

"Decision Tree": max_features=n_features, max_thresholds=0.

The following configurations should only be used for "decision forests":
"Random Tree": max_features<n_features, max_thresholds=0.
"Extreme Randomized Trees (ET)": max_features=n_features, max_thresholds=1.
"Totally Randomized Trees": max_features=1, max_thresholds=1, very similar to "Perfect Random Trees (PERT)".

Member Function Documentation

◆ build()

void koho::DepthFirstTreeBuilder::build ( Tree tree,
Features_t X,
Classes_t y,
SamplesIdx_t  n_samples 
)

Build a binary decision tree from the training data.

Parameters
[in,out]treeA binary decision tree.
[in]XTraining input samples [n_samples x n_features].
[in]yTarget class labels corresponding to the training input samples [n_samples].
[in]n_samplesNumber of samples, minimum 2.

Member Data Documentation

◆ max_depth

TreeDepthIdx_t koho::DepthFirstTreeBuilder::max_depth
protected

◆ missing_values

std::string koho::DepthFirstTreeBuilder::missing_values
protected

◆ splitter

BestSplitter koho::DepthFirstTreeBuilder::splitter
protected

The documentation for this class was generated from the following files: