Select a parameter that maximizes the F-measure
I select the parameter k
(integer) to multiply the classification threshold T
.
That is, T = 0.1k
.
There are three algorithms. It is necessary to choose for each such k
, at which the F-measure (f1_score
) is maximal.
I wrote a code that outputs all the answers, but then you can just run through the eyes, and how do I immediately maximize?
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score`
k = np.arange(1,11,1)
for i in k:
T=0.1*i
for actual, predicted, descr in zip([actual_1, actual_10, actual_11],
[predicted_1 > T, predicted_10 > T, predicted_11 > T],
["Typical:", "Avoids FP:", "Avoids FN:"]):
print(descr, i, "f1 =", f1_score(actual, predicted))
The initial data was as follows:
actual_1 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1.])
predicted_1 = np.array([ 0.41310733, 0.43739138, 0.22346525, 0.46746017, 0.58251177,
0.38989541, 0.43634826, 0.32329726, 0.01114812, 0.41623557,
0.54875741, 0.48526472, 0.21747683, 0.05069586, 0.16438548,
0.68721238, 0.72062154, 0.90268312, 0.46486043, 0.99656541,
0.59919345, 0.53818659, 0.8037637 , 0.272277 , 0.87428626,
0.79721372, 0.62506539, 0.63010277, 0.35276217, 0.56775664])
actual_10 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1.])
predicted_10 = np.array([ 0.29340574, 0.47340035, 0.1580356 , 0.29996772, 0.24115457, 0.16177793,
0.35552878, 0.18867804, 0.38141962, 0.20367392, 0.26418924, 0.16289102,
0.27774892, 0.32013135, 0.13453541, 0.39478755, 0.96625033, 0.47683139,
0.51221325, 0.48938235, 0.57092593, 0.21856972, 0.62773859, 0.90454639, 0.19406537,
0.32063043, 0.4545493 , 0.57574841, 0.55847795 ])
actual_11 = np.array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
predicted_11 = np.array([ 0.35929566, 0.61562123, 0.71974688, 0.24893298, 0.19056711, 0.89308488,
0.71155538, 0.00903258, 0.51950535, 0.72153302, 0.45936068, 0.20197229, 0.67092724,
0.81111343, 0.65359427, 0.70044585, 0.61983513, 0.84716577, 0.8512387 ,
0.86023125, 0.7659328 , 0.70362246, 0.70127618, 0.8578749 , 0.83641841,
0.62959491, 0.90445368])
1 answers
Use scipy.optimize.minimize().
Example:
from scipy.optimize import minimize
def f(k, y_true, y_pred_proba):
return -f1_score(y_true, y_pred_proba >= k)
res_1 = minimize(f, [0.5], (actual_1, predicted_1), method="Nelder-Mead", tol=1e-5)
res_10 = minimize(f, [0.5], (actual_10, predicted_10), method="Nelder-Mead", tol=1e-5)
res_11 = minimize(f, [0.5], (actual_11, predicted_11), method="Nelder-Mead", tol=1e-5)
Results:
In [91]: res_1
Out[91]:
final_simplex: (array([[0.5 ],
[0.5000061]]), array([-0.82758621, -0.82758621]))
fun: -0.8275862068965518
message: 'Optimization terminated successfully.'
nfev: 38
nit: 13
status: 0
success: True
x: array([0.5])
In [92]: res_10
Out[92]:
final_simplex: (array([[0.45 ],
[0.4499939]]), array([-0.76923077, -0.76923077]))
fun: -0.7692307692307692
message: 'Optimization terminated successfully.'
nfev: 42
nit: 15
status: 0
success: True
x: array([0.45])
In [93]: res_11
Out[93]:
final_simplex: (array([[0.525 ],
[0.5250061]]), array([-0.78787879, -0.78787879]))
fun: -0.787878787878788
message: 'Optimization terminated successfully.'
nfev: 37
nit: 13
status: 0
success: True
x: array([0.525])
We are interested in the value with the key "x"
:
In [94]: res_10["x"]
Out[94]: array([0.45])
In [95]: res_11["x"]
Out[95]: array([0.525])