Select a parameter that maximizes the F-measure

I select the parameter k (integer) to multiply the classification threshold T.
That is, T = 0.1k.

There are three algorithms. It is necessary to choose for each such k, at which the F-measure (f1_score) is maximal.

I wrote a code that outputs all the answers, but then you can just run through the eyes, and how do I immediately maximize?

from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score`
k = np.arange(1,11,1)
for i in k:
    T=0.1*i
    for actual, predicted, descr in zip([actual_1, actual_10, actual_11], 
                                    [predicted_1 > T, predicted_10 > T, predicted_11 > T], 
                                    ["Typical:", "Avoids FP:", "Avoids FN:"]):
        print(descr, i, "f1 =", f1_score(actual, predicted))

The initial data was as follows:

actual_1 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) predicted_1 = np.array([ 0.41310733, 0.43739138, 0.22346525, 0.46746017, 0.58251177, 0.38989541, 0.43634826, 0.32329726, 0.01114812, 0.41623557, 0.54875741, 0.48526472, 0.21747683, 0.05069586, 0.16438548, 0.68721238, 0.72062154, 0.90268312, 0.46486043, 0.99656541, 0.59919345, 0.53818659, 0.8037637 , 0.272277 , 0.87428626, 0.79721372, 0.62506539, 0.63010277, 0.35276217, 0.56775664]) actual_10 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) predicted_10 = np.array([ 0.29340574, 0.47340035, 0.1580356 , 0.29996772, 0.24115457, 0.16177793, 0.35552878, 0.18867804, 0.38141962, 0.20367392, 0.26418924, 0.16289102, 0.27774892, 0.32013135, 0.13453541, 0.39478755, 0.96625033, 0.47683139,
0.51221325, 0.48938235, 0.57092593, 0.21856972, 0.62773859, 0.90454639, 0.19406537, 0.32063043, 0.4545493 , 0.57574841, 0.55847795 ]) actual_11 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) predicted_11 = np.array([ 0.35929566, 0.61562123, 0.71974688, 0.24893298, 0.19056711, 0.89308488, 0.71155538, 0.00903258, 0.51950535, 0.72153302, 0.45936068, 0.20197229, 0.67092724, 0.81111343, 0.65359427, 0.70044585, 0.61983513, 0.84716577, 0.8512387 ,
0.86023125, 0.7659328 , 0.70362246, 0.70127618, 0.8578749 , 0.83641841,
0.62959491, 0.90445368])

Author: MaxU, 2019-10-06

1 answers

Use scipy.optimize.minimize().

Example:

from scipy.optimize import minimize

def f(k, y_true, y_pred_proba):
    return -f1_score(y_true, y_pred_proba >= k)

res_1 = minimize(f, [0.5], (actual_1, predicted_1), method="Nelder-Mead", tol=1e-5)
res_10 = minimize(f, [0.5], (actual_10, predicted_10), method="Nelder-Mead", tol=1e-5)
res_11 = minimize(f, [0.5], (actual_11, predicted_11), method="Nelder-Mead", tol=1e-5)

Results:

In [91]: res_1
Out[91]:
 final_simplex: (array([[0.5      ],
       [0.5000061]]), array([-0.82758621, -0.82758621]))
           fun: -0.8275862068965518
       message: 'Optimization terminated successfully.'
          nfev: 38
           nit: 13
        status: 0
       success: True
             x: array([0.5])

In [92]: res_10
Out[92]:
 final_simplex: (array([[0.45     ],
       [0.4499939]]), array([-0.76923077, -0.76923077]))
           fun: -0.7692307692307692
       message: 'Optimization terminated successfully.'
          nfev: 42
           nit: 15
        status: 0
       success: True
             x: array([0.45])

In [93]: res_11
Out[93]:
 final_simplex: (array([[0.525    ],
       [0.5250061]]), array([-0.78787879, -0.78787879]))
           fun: -0.787878787878788
       message: 'Optimization terminated successfully.'
          nfev: 37
           nit: 13
        status: 0
       success: True
             x: array([0.525])

We are interested in the value with the key "x":

In [94]: res_10["x"]
Out[94]: array([0.45])

In [95]: res_11["x"]
Out[95]: array([0.525])
 1
Author: MaxU, 2019-10-06 14:51:45