how to train a model if I used KFold cross validation
After splitting the set using the "sklearn.cross_validation.KFold" I have 6 chunks (3 train ,3 test,+ answers for them) . Is there a function that can be used to train the algorithm just to throw all the chunks, or do you need to constantly write :
Vasya=model.fit(chank1,answer1)
a1=model.predict(Vasya,answer_t_1)
?
2 answers
If you want to learn how to use KFold, here is a small example:
kf = KFold(n_splits=N)
for train, test in kf.split(X):
print("%s %s" % (train, test))
X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]
model.fit(X_train, y_train)
...
The simpler option:
from sklearn.model_selection import cross_val_score
kf = KFold(n_splits=N)
results = cross_val_score(model, X, y, cv=kf)
Crossvalidation
is built into sklearn. If you need to test the model on different folds using KFold
the easiest way is cross_val_score or cross_val_predict
-
cross_val_score(model,chank1,answer1,cv=n)
will give estimates for folds -
cross_val_score(model,chank1,answer1,cv=n)
will give all the predictions for X
But usually crossvalidation selects hyperparameters for this there are GridSearchCV which can be given a grid of parameters and "itself" will select the best one a combination.
NB all these functions have a parameter n_jobs
so it's worth it not to write loops with handles n_jobs = -1
will put the machine on while the work is being done by loading all the prots - this is something that is not so easy to do in python.