koho.cpp
1.1.0
|
Gini Index impurity criterion. More...
#include <decision_tree.h>
Public Member Functions | |
GiniCriterion (OutputsIdx_t n_outputs, ClassesIdx_t *n_classes, ClassesIdx_t n_classes_max, SamplesIdx_t n_samples, ClassWeights_t *class_weight) | |
Create and initialize a new gini criterion. More... | |
void | calculate_node_histogram (Classes_t *y, std::vector< SamplesIdx_t > &samples, SamplesIdx_t start, SamplesIdx_t end) |
Calculate weighted class histograms for all outputs for current node. More... | |
double | calculate_impurity (std::vector< Histogram_t > &histogram) |
Calculate impurity of weighted class histogram using the Gini criterion. More... | |
void | calculate_node_impurity () |
Calculate impurity for all outputs of the current node. More... | |
void | calculate_NA_histogram (Classes_t *y, std::vector< SamplesIdx_t > &samples, SamplesIdx_t pos) |
Calculate class histograms for all outputs for the samples with missing values and the samples with values. More... | |
void | calculate_NA_impurity () |
Calculate impurity for all outputs of samples with missing values and samples with values. More... | |
double | calculate_NA_impurity_improvement () |
void | init_threshold_histograms () |
void | init_threshold_values_histograms () |
void | update_threshold_histograms (Classes_t *y, std::vector< SamplesIdx_t > &samples, SamplesIdx_t new_pos) |
void | calculate_threshold_impurity () |
Calculate impurity for all outputs of samples with values that are smaller and greater than a threshold. More... | |
void | calculate_threshold_NA_impurity () |
double | calculate_threshold_impurity_improvement () |
double | calculate_threshold_values_impurity_improvement () |
double | calculate_threshold_NA_left_impurity_improvement () |
double | calculate_threshold_NA_right_impurity_improvement () |
std::vector< std::vector< Histogram_t > > | get_node_weighted_histogram () |
double | get_node_impurity () |
double | get_node_impurity_NA () |
double | get_node_impurity_values () |
double | get_node_impurity_threshold_left () |
double | get_node_impurity_threshold_right () |
Gini Index impurity criterion.
koho::GiniCriterion::GiniCriterion | ( | OutputsIdx_t | n_outputs, |
ClassesIdx_t * | n_classes, | ||
ClassesIdx_t | n_classes_max, | ||
SamplesIdx_t | n_samples, | ||
ClassWeights_t * | class_weight | ||
) |
Create and initialize a new gini criterion.
Assuming: y[o] is 0, 1, 2, ... (n_classes[o] - 1) for all outputs o.
double koho::GiniCriterion::calculate_impurity | ( | std::vector< Histogram_t > & | histogram | ) |
Calculate impurity of weighted class histogram using the Gini criterion.
void koho::GiniCriterion::calculate_NA_histogram | ( | Classes_t * | y, |
std::vector< SamplesIdx_t > & | samples, | ||
SamplesIdx_t | pos | ||
) |
Calculate class histograms for all outputs for the samples with missing values and the samples with values.
Assuming: number of missing values > 0
void koho::GiniCriterion::calculate_NA_impurity | ( | ) |
Calculate impurity for all outputs of samples with missing values and samples with values.
Assuming: number of missing values > 0
Assuming: calculate_NA_histogram()
double koho::GiniCriterion::calculate_NA_impurity_improvement | ( | ) |
Calculate impurity improvement over all outputs from the current node to its children assuming a split between missing values and values. Assuming: number of missing values > 0
Assuming: calculate_node_impurity(), calculate_NA_impurity()
void koho::GiniCriterion::calculate_node_histogram | ( | Classes_t * | y, |
std::vector< SamplesIdx_t > & | samples, | ||
SamplesIdx_t | start, | ||
SamplesIdx_t | end | ||
) |
Calculate weighted class histograms for all outputs for current node.
void koho::GiniCriterion::calculate_node_impurity | ( | ) |
Calculate impurity for all outputs of the current node.
Assuming: calculate_node_histogram()
void koho::GiniCriterion::calculate_threshold_impurity | ( | ) |
Calculate impurity for all outputs of samples with values that are smaller and greater than a threshold.
Assuming: update_threshold_histograms()
double koho::GiniCriterion::calculate_threshold_impurity_improvement | ( | ) |
Calculate the impurity improvement over all outputs from the current node to its children assuming a split of the samples with values smaller and greater than a threshold in the case that all samples have values. Assuming: calculate_node_impurity(), calculate_threshold_impurity()
void koho::GiniCriterion::calculate_threshold_NA_impurity | ( | ) |
Calculate the impurity for all outputs of samples with values that are smaller and greater than a threshold and passing on the samples with missing values. Assuming: number of missing values > 0
Assuming: update_threshold_histograms(), calculate_NA_histograms()
double koho::GiniCriterion::calculate_threshold_NA_left_impurity_improvement | ( | ) |
Calculate the impurity improvement over all outputs from the current node to its children assuming a split of the samples with values smaller and greater than a threshold and passing on the samples with missing values to the left child. Assuming: calculate_NA_impurity(), calculate_threshold_impurity(), calculate_threshold_NA_impurity()
double koho::GiniCriterion::calculate_threshold_NA_right_impurity_improvement | ( | ) |
Calculate the impurity improvement over all outputs from the current node to its children assuming a split of the samples with values smaller and greater than a threshold and passing on the samples with missing values to the right child. Assuming: calculate_NA_impurity(), calculate_threshold_impurity(), calculate_threshold_NA_impurity()
double koho::GiniCriterion::calculate_threshold_values_impurity_improvement | ( | ) |
Calculate the impurity improvement over all outputs from the current node to its children assuming a split of the samples with values smaller and greater than a threshold in the case that there are also samples with missing values. Assuming: calculate_NA_impurity(), calculate_threshold_impurity()
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
void koho::GiniCriterion::init_threshold_histograms | ( | ) |
Initialize class histograms for all outputs for using a threshold on samples with values, in the case that all samples have values. Assuming: calculate_node_histogram()
void koho::GiniCriterion::init_threshold_values_histograms | ( | ) |
Initialize class histograms for all outputs for using a threshold on samples with values, in the case that there are also samples with missing values. Assuming: number of missing values > 0
Assuming: calculate_NA_histogram()
void koho::GiniCriterion::update_threshold_histograms | ( | Classes_t * | y, |
std::vector< SamplesIdx_t > & | samples, | ||
SamplesIdx_t | new_pos | ||
) |
Update class histograms for all outputs for using a threshold on values, from current position to the new position (correspond to thresholds). Assuming: new_pos > pos
Assuming: init_threshold_histograms() or init_threshold_values_histograms()
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |