машинное-обучение

How to sort GridSearchCV.cv results

I'm taking a course in data science and they use the sklearn library, where there is a GridSearchCV method, the problem is th ... ain_fin, y_train_fin) sorted(gridsearch.cv_results_,key = lambda x: -x.mean_validation_score) What could be the problem ?

How can I encode categorical features containing Nan without adding a new category?

For example, a feature that takes the values {'Male', 'Female', NaN}, when using OneHotEncoder (or some other means), transla ... such a dataset should be obtained regardless of whether the set on which the encoder was trained had Nan in some categories.

How do I add micro avg to the classification report from sklearn.metrics?

I output the calculated metrics for the test data: print(classification_report(y_true, y_pred_classes, target_names = CLAS ... 0.92 0.92 0.92 10000 I'm missing micro avg in this report. How do I correct the output to add this line?

Data normalization

The task is to normalize (0-1) the order stack in stock trading. In the glass, I can only see the 25 best price offers, i.e. ... base for normalization? And if so, how?(Orders are quickly canceled/appear and it is not clear what data should be recorded)

Functions (metrics) for assessing the quality of classification

How can I use sklearn or numpy to find the fraction of wrongly predicted values? There are two arrays of numbers of the same ... d to compare them, and divide the number of incorrect answers by the length. Is it possible to do this somehow in 1 function?

Why do I need a train test split in sklearn?

Now I am engaged in machine learning, can someone tell me in detail why the MO needs X_train,X_test, y_train,y_test, the argu ... X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42) And how is the test_size parameter set?

Different output values with the same parameters when classifying data

I select the parameters for the best training of the classification model. I do it like this: print('Исходная обученность: ... ий: ', accuracy_score(res3, y_test3)) As a result, the results are different. Where did I go wrong? What am I doing wrong?

Classification methods in machine learning

There is a certain classification task: for training, the classifier receives an array of strings as a class and some numbers ... the problem, because I don't know about methods with multiple classes for an object yet. I will be grateful for your advice!

Sampling and cross-validation

Tell me, I have a df... If I'm going to use cross-validation, it's enough for me to split my df into training and test samples and I don't need to extract the validation set additionally. Right? Or do I not understand something correctly?

Why is there such a big difference in accuracy when applying the Gini test and entropy?

Hello everyone. I continue to slowly study ML and got to the well-known data set 'Wine'. And I hit the next point: if I use e ... rong (for example, I calculated the accuracy)? I read the theory and did not find any prerequisites for such big differences.

Logistic regression in Python

Here is the code from the course y_pred_train=logreg.predict(x_train) y_predict_train=logreg.predict_proba(x_train)[:,1] lo ... that is, how many times do not call predict (), the coefficients or weights will be the same. But I'm not so sure about that)

ValueError error: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all()

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import tr ... an error. ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Why does the Sigmoid activation function work, but ReLU does not?

There is such a code: from keras.models import Sequential from keras.layers import Dense import numpy from numpy import exp, ... (should be: 0, 1): For ReLU: accuracy: 50.00% [[0.9953842]] [[0]] For Sigmoid: accuracy: 100.00% [[0.13338517]] [[1]]

L1 and L2 regularization, L1 and L2 norm

What do these concepts have in common and how do they differ? Do I understand correctly that with L1 regularization, some o ... not "fade" because of one feature? And L1 and L2 are the norm, are they just different options for calculating the distance?

Machine learning LDA, text classification

I need to classify texts (news) by the importance of the Russian Federation. 1 category-threat of NB (key phrases: terrorism, ... out myself. But I have little experience, and I can't figure out how to fill it out myself, without delivering a set of texts

Python average relative error of regression approximation

I'm new to python and don't quite understand how I can calculate the average relative approximation error using the formula ... ready-made function for getting this error, or is it just a loop? If it is a loop, do I need to normalize Y_test-real values?

Why do activation functions in neural networks take on such small values?

After all, even if the values of the activation function were in the values from -10 to 10, this would make the network more ... ble, as it seems to me. After all, the problem can't just be the lack of a suitable formula. Please explain what I'm missing.

The simplest implementation of the linear regression algorithm. What was I wrong about?

I am implementing a linear regression algorithm based on two parameters. When the dataset is increased by an order of magnitu ... r(x, y) plt.plot(*zip(x, y), marker='o', color="r", ls="") plt.plot([t0 + t1 * i for i in range(10)]) plt.show()

Recognizing text in a specific area of a PDF scan in Python

There is a scan of the document in pdf format. How to recognize text in a certain area of such a document, more precisely, a digital number?

The ML k-nearest Neighbor (kNN) algorithm)

Tell me if it is possible and how you can add a condition so that the prediction of the label knn.predict(x_test) occurs only ... _score(knn, X, Y, cv=LeaveOneOut()) print(scores.mean()) Or is there another method more suitable for this purpose? Thanks!