Most important attributes in Random Forest Classifier

Good afternoon guys, I would like to know if you can return a percentage of each attribute used in the Random Forest Classifier training, to show which attributes are the most deterministic.

Author: Rodolfo Donato, 2017-12-06

2 answers

Reading the documentation is always a good first step.

In any case, from manual :

feature_importances_: array of shape = [n_features]
The feature importances (the higher, the more important the feature).

Who in a free translation:

importance_das_variaveis_: vector with Shape = [attribute_number] The importance of the variables (the higher the greater the importance of the variable).

Just to leave extremely clear, you will boot your Model (1), train it (2) and then get an importance of variables:

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()      # (1)
clf.fit(x, y)                       # (2)
print(clf.feature_importances_)     # (3)
 1
Author: fmv1992, 2017-12-06 16:45:48

This paper proposes a methodology to analyze the predictions of this type of algorithm. Fortunately there is this python project that implements the methodology.

In this link has a tutorial of using exactly with RandomForest. I am copying the code below so as not to risk the link stopping working.

import sklearn
import sklearn.datasets
import sklearn.ensemble
import numpy as np
import lime
import lime.lime_tabular
from __future__ import print_function
np.random.seed(1)

# treinar algoritmo 
iris = sklearn.datasets.load_iris()
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(iris.data, iris.target, train_size=0.80)
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train, labels_train)

# explicar as predições
explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=iris.feature_names, class_names=iris.target_names, discretize_continuous=True)

i = np.random.randint(0, test.shape[0])
exp = explainer.explain_instance(test[i], rf.predict_proba, num_features=2, top_labels=1)
 1
Author: Daniel Falbel, 2017-12-06 19:59:26