sklearn classification report and confusion matrix: do the values not beat?

Model: logistic regression with sklearn.

I decided to compare the results shown in classification_report, calculating them using the confusion matrix but apparently the results do not beat:

Classification_report:

precision    recall  f1-score   support

      0       0.54      0.94      0.68     56000
      1       0.96      0.62      0.75    119341

Avg / total 0.82 0.72 0.73 175341

Generated confusion array:

    [52624  3376]
   [45307 74034]]

My calculations based on the above confusion matrix:

How much on average the model hits (accuracy)?

(TP + TN)/total

(74034 + 52624)/(52624 + 74034 +45307 + 74034)*100 = 51%

How accurate is the model (ratio of the number of TP and to the sum of TP and FP)?

74034/(74034 + 3376)*100 = 95%

What is the recall R of the model (ratio of the number of TP to the sum of TP and FN)

74034/(74034 + 45307)*100 = 62%

As we can see, recall and precision do not beat. What's wrong? How to interpret the results? What do f1-score and support stand for?

Author: Ed S, 2018-06-24

1 answers

I will try to explain step by step the analyze so that you can understand the problem or someone else with the same problem can understand how to solve these things.

First, I will generate 2 vectors, target and predicted, which will simulate the result of your classification. These vectors were created from the data you passed.

First, classification_report says you have 56k of Class 0 and 119341 of Class 1 in your rating. So I'll generate a vector with 56 thousand zeros and 119341 ones.

import nump as np

class0 = 56000 
class1 = 119341
total = class0 + class1

target          = np.zeros(total, dtype=np.int)
target[class0:] = np.ones(class1, dtype=np.int)

# pra provar que os valores estao certos
sum(target == 0) == class0, sum(target == 1) == class1

With this, you have the vector target, with the data that your classification should have got right. Let's now generate the predicted, which will get what your rating reported. This data was taken from your confusion matrix.

class0_hit  = 52624 # qto acertou da classe 0
class0_miss = 3376 # qto errou da classe 0
class1_miss = 45307 # qto errou da classe 1
class1_hit  = 74034 # qto acertou da classe 1

predicted = np.zeros(total, dtype=np.int)

predicted[class0_hit:class0_hit + class0_miss + class1_hit] = np.ones(class0_miss + class1_hit, dtype=np.int)

# pra provar que os valores estao certos
sum(predicted == 0) == class0_hit + class1_miss, sum(predicted == 1) == class0_miss + class1_hit

Now we can look at sklearn's classification report and see what it tells us of these values:

from sklearn.metrics import classification_report
print (classification_report(target, predicted))

             precision    recall  f1-score   support

          0       0.54      0.94      0.68     56000
          1       0.96      0.62      0.75    119341

avg / total       0.82      0.72      0.73    175341

This is exactly the same as the classification report you pasted. We have reached the same point that you.

Now looking at the confusion matrix:

from sklearn.metrics import confusion_matrix
print (confusion_matrix(target, predicted))

[[52624  3376]
 [45307 74034]]

Continues the same. Let's look at what accuracy says:

from sklearn.metrics import accuracy_score
accuracy_score(target, predicted)
> 0.7223524446649672

It returns 72%. Same as the classification report. So why are your accounts giving 51% in accuracy? In your account this:

(TP + TN)/total
(74034 + 52624)/(52624 + 74034 + 45307 + 74034)*100 = 51%

If you notice, the value 74.034 is repeated 2x. doing the math using the values set in the code, it would look like this:

 acc = (class0_hit + class1_hit) / total
 > 0.7223524446649672

Which matches the value of accuracy_score. The calculation of accuracy and recall are you right:

from sklearn.metrics import precision_score
precision_score(target, predicted)
> 0.9563880635576799

from sklearn.metrics import recall_score
recall_score(target, predicted)
> 0.6203567927200208

But why, then, is classification_report returning those weird values at the end? The answer is simple and is in his documentation.

The reported averages are a prevalence-weighted macro-average across classes (equivalent to precision_recall_fscore_support with average= 'weighted').

That is, it does not do the simple calculation, it takes into account the amount of each class to calculate the average.

Let's go take a look at this method precision_recall_fscore_support. It has a parameter called average, used to control the behavior of the calculation. Running it with the same parameter as the classification_report we have the same result:

from sklearn.metrics import precision_recall_fscore_support
precision_recall_fscore_support(target, predicted, average='weighted')
> (0.8225591977440773, 0.7223524446649672, 0.7305824989909749, None)

Now, since your classification has only 2 classes, the right thing is to ask it to calculate with average binary. Changing the parameter to binary, we get the result:

precision_recall_fscore_support(target, predicted, average='binary')
> (0.9563880635576799, 0.6203567927200208, 0.75256542533456, None)

Which is exactly the result we find using sklearn's own functions or doing the calculation in the hand.

 3
Author: Begnini, 2018-07-05 13:23:36