машинное-обучение
How to sort GridSearchCV.cv results
I'm taking a course in data science and they use the sklearn library, where there is a GridSearchCV method, the problem is th ... ain_fin, y_train_fin)
sorted(gridsearch.cv_results_,key = lambda x: -x.mean_validation_score)
What could be the problem ?
How can I encode categorical features containing Nan without adding a new category?
For example, a feature that takes the values {'Male', 'Female', NaN}, when using OneHotEncoder (or some other means), transla ... such a dataset should be obtained regardless of whether the set on which the encoder was trained had Nan in some categories.
How do I add micro avg to the classification report from sklearn.metrics?
I output the calculated metrics for the test data:
print(classification_report(y_true, y_pred_classes, target_names = CLAS ... 0.92 0.92 0.92 10000
I'm missing micro avg in this report. How do I correct the output to add this line?
Data normalization
The task is to normalize (0-1) the order stack in stock trading.
In the glass, I can only see the 25 best price offers, i.e. ... base for normalization? And if so, how?(Orders are quickly canceled/appear and it is not clear what data should be recorded)
Functions (metrics) for assessing the quality of classification
How can I use sklearn or numpy to find the fraction of wrongly predicted values?
There are two arrays of numbers of the same ... d to compare them, and divide the number of incorrect answers by the length. Is it possible to do this somehow in 1 function?
Why do I need a train test split in sklearn?
Now I am engaged in machine learning, can someone tell me in detail why the MO needs X_train,X_test, y_train,y_test, the argu ...
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
And how is the test_size parameter set?
Different output values with the same parameters when classifying data
I select the parameters for the best training of the classification model.
I do it like this:
print('Исходная обученность: ... ий: ', accuracy_score(res3, y_test3))
As a result, the results are different.
Where did I go wrong? What am I doing wrong?
Classification methods in machine learning
There is a certain classification task: for training, the classifier receives an array of strings as a class and some numbers ... the problem, because I don't know about methods with multiple classes for an object yet.
I will be grateful for your advice!
Sampling and cross-validation
Tell me, I have a df... If I'm going to use cross-validation, it's enough for me to split my df into training and test samples and I don't need to extract the validation set additionally. Right? Or do I not understand something correctly?
Why is there such a big difference in accuracy when applying the Gini test and entropy?
Hello everyone.
I continue to slowly study ML and got to the well-known data set 'Wine'. And I hit the next point: if I use e ... rong (for example, I calculated the accuracy)? I read the theory and did not find any prerequisites for such big differences.
Logistic regression in Python
Here is the code from the course
y_pred_train=logreg.predict(x_train)
y_predict_train=logreg.predict_proba(x_train)[:,1]
lo ... that is, how many times do not call predict (), the coefficients or weights will be the same.
But I'm not so sure about that)
ValueError error: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all()
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import tr ... an error.
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Why does the Sigmoid activation function work, but ReLU does not?
There is such a code:
from keras.models import Sequential
from keras.layers import Dense
import numpy
from numpy import exp, ... (should be: 0, 1):
For ReLU:
accuracy: 50.00%
[[0.9953842]]
[[0]]
For Sigmoid:
accuracy: 100.00%
[[0.13338517]]
[[1]]
L1 and L2 regularization, L1 and L2 norm
What do these concepts have in common and how do they differ?
Do I understand correctly that with L1 regularization, some o ... not "fade" because of one feature? And L1 and L2 are the norm, are they just different options for calculating the distance?
Machine learning LDA, text classification
I need to classify texts (news) by the importance of the Russian Federation. 1 category-threat of NB (key phrases: terrorism, ... out myself. But I have little experience, and I can't figure out how to fill it out myself, without delivering a set of texts
Python average relative error of regression approximation
I'm new to python and don't quite understand how I can calculate the average relative approximation error using the formula
... ready-made function for getting this error, or is it just a loop? If it is a loop, do I need to normalize Y_test-real values?
Why do activation functions in neural networks take on such small values?
After all, even if the values of the activation function were in the values from -10 to 10, this would make the network more ... ble, as it seems to me. After all, the problem can't just be the lack of a suitable formula. Please explain what I'm missing.
The simplest implementation of the linear regression algorithm. What was I wrong about?
I am implementing a linear regression algorithm based on two parameters. When the dataset is increased by an order of magnitu ... r(x, y)
plt.plot(*zip(x, y), marker='o', color="r", ls="")
plt.plot([t0 + t1 * i for i in range(10)])
plt.show()
Recognizing text in a specific area of a PDF scan in Python
There is a scan of the document in pdf format. How to recognize text in a certain area of such a document, more precisely, a digital number?
The ML k-nearest Neighbor (kNN) algorithm)
Tell me if it is possible and how you can add a condition so that the prediction of the label knn.predict(x_test) occurs only ... _score(knn, X, Y, cv=LeaveOneOut())
print(scores.mean())
Or is there another method more suitable for this purpose? Thanks!