A decision forest classifier. More...

#include <decision_forest.h>

Collaboration diagram for koho::DecisionForestClassifier:

[legend]

Public Member Functions
	DecisionForestClassifier (std::vector< std::vector< std::string >> const &classes, std::vector< std::string > const &features, unsigned long n_estimators=100, bool bootstrap=false, bool oob_score=false, std::string const &class_balance="balanced", TreeDepthIdx_t max_depth=3, FeaturesIdx_t max_features=0, unsigned long max_thresholds=0, std::string const &missing_values="None", long random_state_seed=0)
	Create and initialize a new decision forest classifier. More...

void	fit (std::vector< Features_t > &X, std::vector< Classes_t > &y)
	Build a decision forest classifier from the training data. More...

void	predict_proba (Features_t X, SamplesIdx_t n_samples, double y_prob)
	Predict classes probabilities for the test data. More...

void	predict (Features_t X, SamplesIdx_t n_samples, Classes_t y)
	Predict classes for the test data. More...

double	score (Features_t X, Classes_t y, SamplesIdx_t n_samples)
	Calculate score for the test data. More...

void	calculate_feature_importances (double *importances)
	Calculate feature importances from the decision forest. More...

void	export_graphviz (std::string const &filename, bool rotate=false)
	Export of a decision forest as individual decision trees in GraphViz dot format. More...

void	export_graphviz (std::string const &filename, unsigned long e, bool rotate)
	Export of a decision tree from a decision forest in GraphViz dot format. More...

std::string	export_text (unsigned long e)
	Export of a decision tree from a decision forest in a simple text format. More...

void	export_serialize (std::string const &filename)

void	serialize (std::ofstream &fout)
	Serialize. More...

Static Public Member Functions
static DecisionForestClassifier	import_deserialize (std::string const &filename)

static DecisionForestClassifier	deserialize (std::ifstream &fin)
	Deserialize. More...

Protected Attributes
OutputsIdx_t	n_outputs

std::vector< std::vector< std::string > >	classes

std::vector< ClassesIdx_t >	n_classes

ClassesIdx_t	n_classes_max

std::vector< std::string >	features

FeaturesIdx_t	n_features

unsigned long	n_estimators

bool	bootstrap

bool	oob_score

std::string	class_balance

TreeDepthIdx_t	max_depth

FeaturesIdx_t	max_features

unsigned long	max_thresholds

std::string	missing_values

RandomState	random_state

std::vector< DecisionTreeClassifier >	dtc_

double	oob_score_

Detailed Description

A decision forest classifier.

Constructor & Destructor Documentation

◆ DecisionForestClassifier()

koho::DecisionForestClassifier::DecisionForestClassifier	(	std::vector< std::vector< std::string >> const &	classes,
		std::vector< std::string > const &	features,
		unsigned long	n_estimators = `100`,
		bool	bootstrap = `false`,
		bool	oob_score = `false`,
		std::string const &	class_balance = `"balanced"`,
		TreeDepthIdx_t	max_depth = `3`,
		FeaturesIdx_t	max_features = `0`,
		unsigned long	max_thresholds = `0`,
		std::string const &	missing_values = `"None"`,
		long	random_state_seed = `0`
	)

Create and initialize a new decision forest classifier.

Parameters

[in]	classes	Class labels for each output.
[in]	features	Feature names.
[in]	n_estimators	Number of decision trees in the forest. If 1, then the decision forest classifier is a decision tree classifier. integer (default=10)
[in]	bootstrap	Whether bootstrap samples are used when building trees. Out-of-bag samples are used to estimate the generalization accuracy. boolean (default=true)
[in]	oob_score	Whether to use out-of-bag samples to estimate the generalization accuracy. boolean (default=false)
[in]	class_balance	Weighting of the classes. string "balanced" or "None", (default="balanced") If "balanced", then the values of y are used to automatically adjust class weights inversely proportional to class frequencies in the input data. If "None", all classes are supposed to have weight one.
[in]	max_depth	The maximum depth of the tree. The depth of the tree is expanded until the specified maximum depth of the tree is reached or all leaves are pure or no further impurity improvement can be achieved. integer (default=3) If 0 the maximum depth of the tree is set to max long (2^31-1).
[in]	max_features	Number of random features to consider when looking for the best split at each node, between 1 and n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found up to the point that all features have been considered, even if it requires to effectively inspect more than max_features features. integer (default=0) If 0 the number of random features = number of features. Note: only to be used by Decision Forest
[in]	max_thresholds	Number of random thresholds to consider when looking for the best split at each node. integer (default=0) If 0, then all thresholds, based on the mid-point of the node samples, are considered. If 1, then consider 1 random threshold, based on the `Extreme Randomized Tree` formulation. Note: only to be used by Decision Forest
[in]	missing_values	Handling of missing values. string "NMAR" or "None", (default="None") If "NMAR" (Not Missing At Random), then during training: the split criterion considers missing values as another category and samples with missing values are passed to either the left or the right child depending on which option provides the best split, and then during testing: if the split criterion includes missing values, a missing value is dealt with accordingly (passed to left or right child), or if the split criterion does not include missing values, a missing value at a split criterion is dealt with by combining the results from both children proportionally to the number of samples that are passed to the children during training. If "None", an error is raised if one of the features has a missing value. An option is to use imputation (fill-in) of missing values prior to using the decision tree classifier.
[in]	random_state_seed	Seed used by the random number generator. integer (default=0) If -1, then the random number generator is seeded with the current system time. Note: only to be used by Decision Forest

"Decision Tree": n_estimators=1, max_features=n_features, max_thresholds=0.

The following configurations should only be used for "decision forests":
"Random Tree": max_features<n_features, max_thresholds=0.
"Extreme Randomized Trees (ET)": max_features=n_features, max_thresholds=1.
"Totally Randomized Trees": max_features=1, max_thresholds=1, very similar to "Perfect Random Trees (PERT)".

Member Function Documentation

◆ calculate_feature_importances()

void koho::DecisionForestClassifier::calculate_feature_importances ( double * importances )

Calculate feature importances from the decision forest.

Parameters

[in,out] importances Feature importances corresponding to all features [n_features].
The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature.

◆ deserialize()

DecisionForestClassifier koho::DecisionForestClassifier::deserialize ( std::ifstream & fin )

static

Deserialize.

◆ export_graphviz() [1/2]

void koho::DecisionForestClassifier::export_graphviz	(	std::string const &	filename,
		bool	rotate = `false`
	)

Export of a decision forest as individual decision trees in GraphViz dot format.

Parameters

[in]	filename	Common filename of individual decision trees, "filename_<0, ... n_estimators-1>" used for the individual decision trees, extension .gv added.
[in]	rotate	Rotate display of decision trees. boolean (default=false) If false, then orient tree top-down. If true, then orient tree left-to-right. Ubuntu: sudo apt-get install graphviz sudo apt-get install xdot view $: xdot filename.gv create pdf, png $: dot -Tpdf filename.gv -o filename.pdf $: dot -Tpng filename.gv -o filename.png Windows: Install graphviz-2.38.msi from http://www.graphviz.org/Download_windows.php START> "Advanced System Settings" Click "Environmental Variables ..." Click "Browse..." Select "C:/ProgramFiles(x86)/Graphviz2.38/bin" view START> gvedit

◆ export_graphviz() [2/2]

void koho::DecisionForestClassifier::export_graphviz	(	std::string const &	filename,
		unsigned long	e,
		bool	rotate
	)

Export of a decision tree from a decision forest in GraphViz dot format.

Parameters

[in]	filename	Common filename of individual decision trees, "filename_<0, ... n_estimators-1>" used for the individual decision trees, extension .gv added.
[in]	e	Decision tree index 0, ... n_estimators-1.
[in]	rotate	Rotate display of decision trees. boolean (default=false) If false, then orient tree top-down. If true, then orient tree left-to-right.

◆ export_serialize()

void koho::DecisionForestClassifier::export_serialize ( std::string const & filename )

Export of a decision forest classifier in binary serialized format with separate files for the individual decision trees.

Parameters

[in] filename Filename of decision forest, extension .dfc added, "filename_<0, ... n_estimators-1>" used for the individual decision trees, extension .dtc added.

◆ export_text()

std::string koho::DecisionForestClassifier::export_text ( unsigned long e )

Export of a decision tree from a decision forest in a simple text format.

Parameters

[in] e Decision tree index 0, ... n_estimators-1.

◆ fit()

void koho::DecisionForestClassifier::fit	(	std::vector< Features_t > &	X,
		std::vector< Classes_t > &	y
	)

Build a decision forest classifier from the training data.

Parameters

[in]	X	Training input samples [n_samples x n_features].
[in]	y	Target class labels corresponding to the training input samples [n_samples].

◆ import_deserialize()

DecisionForestClassifier koho::DecisionForestClassifier::import_deserialize ( std::string const & filename )

static

Import of a decision forest classifier in binary serialized format with separate files for the individual decision trees.

Parameters

[in] filename Filename of decision forest, extension .dfc added, "filename_<0, ... n_estimators-1>" used for the individual decision trees, extension .dtc added.

◆ predict()

void koho::DecisionForestClassifier::predict	(	Features_t *	X,
		SamplesIdx_t	n_samples,
		Classes_t *	y
	)

Predict classes for the test data.

Parameters

[in]	X	Test input samples [n_samples x n_features].
[in]	n_samples	Number of samples in the test data.
[in,out]	y	Predicted classes for the test input samples [n_samples]. Using 1d array addressing for X and y to support efficient Cython bindings to Python using memory views.

◆ predict_proba()

void koho::DecisionForestClassifier::predict_proba	(	Features_t *	X,
		SamplesIdx_t	n_samples,
		double *	y_prob
	)

Predict classes probabilities for the test data.

Parameters

[in]	X	Test input samples [n_samples x n_features].
[in]	n_samples	Number of samples in the test data.
[in,out]	y_prob	Class probabilities corresponding to the test input samples [n_samples x n_classes]. We use n_classes_max to create a nice 3D array to hold the predicted values x samples x classes as the number of classes can be different for different outputs. Using 1d array addressing for X and y_prob to support efficient Cython bindings to Python using memory views.

◆ score()

double koho::DecisionForestClassifier::score	(	Features_t *	X,
		Classes_t *	y,
		SamplesIdx_t	n_samples
	)

Calculate score for the test data.

Parameters

[in]	X	Test input samples [n_samples x n_features].
[in]	y	True classes for the test input samples [n_samples].
[in]	n_samples	Number of samples in the test data.

Returns: Score.
Using 1d array addressing for X and y to support efficient Cython bindings to Python using memory views.

◆ serialize()

void koho::DecisionForestClassifier::serialize ( std::ofstream & fout )

Serialize.

Member Data Documentation

◆ bootstrap

bool koho::DecisionForestClassifier::bootstrap

protected

◆ class_balance

std::string koho::DecisionForestClassifier::class_balance

protected

◆ classes

std::vector<std::vector<std::string> > koho::DecisionForestClassifier::classes

protected

◆ dtc_

std::vector<DecisionTreeClassifier> koho::DecisionForestClassifier::dtc_

protected

◆ features

std::vector<std::string> koho::DecisionForestClassifier::features

protected

◆ max_depth

TreeDepthIdx_t koho::DecisionForestClassifier::max_depth

protected

◆ max_features

FeaturesIdx_t koho::DecisionForestClassifier::max_features

protected

◆ max_thresholds

unsigned long koho::DecisionForestClassifier::max_thresholds

protected

◆ missing_values

std::string koho::DecisionForestClassifier::missing_values

protected

◆ n_classes

std::vector<ClassesIdx_t> koho::DecisionForestClassifier::n_classes

protected

◆ n_classes_max

ClassesIdx_t koho::DecisionForestClassifier::n_classes_max

protected

◆ n_estimators

unsigned long koho::DecisionForestClassifier::n_estimators

protected

◆ n_features

FeaturesIdx_t koho::DecisionForestClassifier::n_features

protected

◆ n_outputs

OutputsIdx_t koho::DecisionForestClassifier::n_outputs

protected

◆ oob_score

bool koho::DecisionForestClassifier::oob_score

protected

◆ oob_score_

double koho::DecisionForestClassifier::oob_score_

protected

◆ random_state

RandomState koho::DecisionForestClassifier::random_state

protected

The documentation for this class was generated from the following files:

Public Member Functions

Static Public Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ DecisionForestClassifier()

Member Function Documentation

◆ calculate_feature_importances()

◆ deserialize()

◆ export_graphviz() [1/2]

◆ export_graphviz() [2/2]

◆ export_serialize()

◆ export_text()

◆ fit()

◆ import_deserialize()

◆ predict()

◆ predict_proba()

◆ score()

◆ serialize()

Member Data Documentation

◆ bootstrap

◆ class_balance

◆ classes

◆ dtc_

◆ features

◆ max_depth

◆ max_features

◆ max_thresholds

◆ missing_values

◆ n_classes

◆ n_classes_max

◆ n_estimators

◆ n_features

◆ n_outputs

◆ oob_score

◆ oob_score_

◆ random_state