scikit-learn

How to sort GridSearchCV.cv results

I'm taking a course in data science and they use the sklearn library, where there is a GridSearchCV method, the problem is th ... ain_fin, y_train_fin) sorted(gridsearch.cv_results_,key = lambda x: -x.mean_validation_score) What could be the problem ?

How can I encode categorical features containing Nan without adding a new category?

For example, a feature that takes the values {'Male', 'Female', NaN}, when using OneHotEncoder (or some other means), transla ... such a dataset should be obtained regardless of whether the set on which the encoder was trained had Nan in some categories.

How do I add micro avg to the classification report from sklearn.metrics?

I output the calculated metrics for the test data: print(classification_report(y_true, y_pred_classes, target_names = CLAS ... 0.92 0.92 0.92 10000 I'm missing micro avg in this report. How do I correct the output to add this line?

Functions (metrics) for assessing the quality of classification

How can I use sklearn or numpy to find the fraction of wrongly predicted values? There are two arrays of numbers of the same ... d to compare them, and divide the number of incorrect answers by the length. Is it possible to do this somehow in 1 function?

Different output values with the same parameters when classifying data

I select the parameters for the best training of the classification model. I do it like this: print('Исходная обученность: ... ий: ', accuracy_score(res3, y_test3)) As a result, the results are different. Where did I go wrong? What am I doing wrong?

Classification methods in machine learning

There is a certain classification task: for training, the classifier receives an array of strings as a class and some numbers ... the problem, because I don't know about methods with multiple classes for an object yet. I will be grateful for your advice!

Sampling and cross-validation

Tell me, I have a df... If I'm going to use cross-validation, it's enough for me to split my df into training and test samples and I don't need to extract the validation set additionally. Right? Or do I not understand something correctly?

Why is there such a big difference in accuracy when applying the Gini test and entropy?

Hello everyone. I continue to slowly study ML and got to the well-known data set 'Wine'. And I hit the next point: if I use e ... rong (for example, I calculated the accuracy)? I read the theory and did not find any prerequisites for such big differences.

Logistic regression in Python

Here is the code from the course y_pred_train=logreg.predict(x_train) y_predict_train=logreg.predict_proba(x_train)[:,1] lo ... that is, how many times do not call predict (), the coefficients or weights will be the same. But I'm not so sure about that)

ValueError error: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all()

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import tr ... an error. ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Python average relative error of regression approximation

I'm new to python and don't quite understand how I can calculate the average relative approximation error using the formula ... ready-made function for getting this error, or is it just a loop? If it is a loop, do I need to normalize Y_test-real values?

The ML k-nearest Neighbor (kNN) algorithm)

Tell me if it is possible and how you can add a condition so that the prediction of the label knn.predict(x_test) occurs only ... _score(knn, X, Y, cv=LeaveOneOut()) print(scores.mean()) Or is there another method more suitable for this purpose? Thanks!

Nonlinear regression by the Gauss-Newton method

It is required to implement a nonlinear regression of the circular point cloud. There is a point cloud in 3d circular cross-s ... ud. Tell me, in which direction to look for a solution and are there any examples of implementations of nonlinear regression?

Select a parameter that maximizes the F-measure

I select the parameter k (integer) to multiply the classification threshold T. That is, T = 0.1k. There are three algorithm ... 6023125, 0.7659328 , 0.70362246, 0.70127618, 0.8578749 , 0.83641841, 0.62959491, 0.90445368])

Python: ValueError too many values to unpack (expected 2)

I'm trying to find the best parameters for the model using GridSearchCV and I want to use the data for April as cross validat ... s_ model.best_params_ When I run the code, this error occurs: Can you please tell me what the problem might be? Thank you

how to train a model if I used KFold cross validation

After splitting the set using the "sklearn.cross_validation.KFold" I have 6 chunks (3 train ,3 test,+ answers for them) . Is ... all the chunks, or do you need to constantly write : Vasya=model.fit(chank1,answer1) a1=model.predict(Vasya,answer_t_1) ?

Python Anaconda: 1) installation; 2) need for Machine Learning

Two questions about Python Anaconda OS Ubuntu 16.04. Do I need to demolish the existing Python and libraries (pandas, numpy ... the PA is valid will it greatly simplify life in this sense? The questions are simple, so I will accept answers like yes\no.

cross validation

Program code, cross-validation is considered a bit wrong, help fix import numpy as np from pandas import DataFrame import pa ... st[:9]) The answer is always only this total {'AdaBoostClassifier': 1.0} Original selection https://ru.files.fm/u/aempdy95

Missing sklearn.cross validation module

Mistake: In the example, there was a module sklearn.cross_validation. The module is missing and the program does not work. I ... =1): 8 print('{:^9} {} {!s:^25}'.format(iteration, data[0], data[1])) TypeError: 'KFold' object is not iterable

IndexError: too many indices for array

Please help me I can't understand what exactly I'm doing wrong, I think the error is stupid, but I don't have enough knowledg ... f.fit(X_train, y_train) clf.score(X_test, y_test) clf.predict(X_test) print ('AdaBoostClassifier:\n', X_test[:9]) Error