pandas

Parsing an HTML file using BeautifulSoup in a DataFrame

There is a file of the form: <TD class="c1">111-1111</TD> <TD class="c2">AA1111-1111</TD> <TD cla ... = value that I have now .368 to lead to a numeric value? a value of the form 0.368 ? I will be grateful for any information !

How to sort GridSearchCV.cv results

I'm taking a course in data science and they use the sklearn library, where there is a GridSearchCV method, the problem is th ... ain_fin, y_train_fin) sorted(gridsearch.cv_results_,key = lambda x: -x.mean_validation_score) What could be the problem ?

Convert the date to a string format with the number of days

The DataFrame has a column with values in the following format: 507 days 00:00:00. I need the output to have a value of just 507. I came across .strftime(), but how to apply it here is not very clear.

How do I delete duplicate rows in an array?

Given a NumPy array: [[1 0 1 0 0 1] [1 1 1 0 0 1] [1 0 1 0 0 1] [1 0 1 0 1 1] [1 0 1 0 0 1]] a = [[1, 0, 1, 0, 0, 1], ... np.array(a) How can I delete duplicate rows? That is, the output should be: [[1 0 1 0 0 1] [1 1 1 0 0 1] [1 0 1 0 1 1]]

Iterating through dataframe strings in Pandas (Python)

You need to write a function that will take each row of the dataframe, and return the column names (with a value of 1) as a l ... the same initially, but I can't think of how to swap the list of columns and df[i]. As a result, I wandered into a dead end.

Pandas error: KeyError: "None of [Index (['Binary'], dtype= 'object')] are in the [columns]"

CSV file table: Binary 16_bit 0 0 0 1 1 1 2 10 2 3 11 3 4 ... ry'] y = dataset['16_bit'] print(x) Mistake: KeyError: "None of [Index(['Binary'], dtype='object')] are in the [columns]"

How to fill in NaN if one value is possible?

Let's imagine a table of two columns in a dataframe: 0 NaN 0 NaN 0 1 1 0 1 5 1 NaN 2 0 0 NaN 0 1 2 1 2 NaN 3 NaN 3 500 3 N ... y the only non - NaN value on the right? Desired result: 0 1 0 1 0 1 1 0 1 5 1 NaN 2 0 0 1 0 1 2 1 2 NaN 3 500 3 500 3 500

Replacing values with NaN

I work with a table of data that has both positive and negative values. How can I replace all positive values in a certain column with NaN? I tried it like this: df.loc[df['days_employed'] > 0, 'days_employed'] = "NaN" But I got an error.

How does sort values work for multiple columns in Pandas?

There is a DataFrame that can be sorted using df.sort_values(by = 'Name'), where Name is the name of the column by which we s ... ne', 'two']) How do we get this? one two three 2 1 2 3 1 1 3 4 3 1 4 2 0 2 1 5

How do I add a new column with a category as a result of grouping the previous ones?

There is a DataFrame with columns 'floor' and 'floors_total'. I need to add a separate column with their grouping: 'First', ' ... re not correct in most cases (they do not fit the 'floor' == 'floors_total' condition). Can you tell me what I'm doing wrong?

How to read data from an Excel spreadsheet to a DataFrame in a special format? Pandas Library

Let's say there is the following table Time Value1 Value2 0 10 30.5 21.6 1 11 11 50.2 2 13 ... 1 28 114.1 15.33 2 27 16.1 15 First, read the entire file, and then change it - IS NOT RELEVANT. Thanks.

Get a list of words in the text and the frequency of their repetition, and enter the result in a DataFrame

There is a text file with the following text: "Example example for python for test". Case-insensitive, i.e. Пример = пример Y ... frequency) of the type: пример 2 для 2 питона 1 теста 1 And put all this in DF, 1 column : "word", 2 column : "frequency"

Converting Pandas Timestamp to string

There is a parser code: import io from zipfile import ZipFile import pandas as pd def read_zip(zip_fn, extract_fn=None): ... o write to the database. But Timestamp does not write correctly to the DB, because of this I will have to convert to string.

When reading a csv file, Pandas does not split the data, but leaves it in the first column

I try to split the data, but pandas leaves everything in the first column. df = pd.read_csv('testdata.csv',sep=',',encoding = ... tach a link to a file that I can't read. Https://drive.google.com/drive/folders/1RJDRRZZN9V8z5nCkFJA89jHzecrRWxBx?usp=sharing

Convert Pandas.The series obtained as a result of grouping, in Pandas. DataFrame

I have a question when working with a table in Pandas. I get everything below #достали из базы данных табличку ... м - 110 17:46 Карп жареный - 240 UPD: I tried to do as in the answer to the question

Save CSV / DataFrame to Excel with automatic column width setting, column autofilter, etc.

There was a need to automatically convert CSV files to Excel files, but so that the appropriate column width is automatically ... ers", the column width adapted to the data width or the width of the column name, and freezing the row with the column names?

Adjusting columns in pivot tables pandas

Colleagues, help us correct the columns in the dataframe: Original dataframe: Регистрационный номер Год Наименование ... 87645.0 3487569 How to make it so that the name of the column "Value" is removed, and Fixed assets Assets, Stocks remain.

How do I change the date format in a DataFrame?

The table contains data in the format: datetime64[ns]. In this case, the string has the form: 02.03.2020 0:00:00. How can I convert the entire column to the date format: 02.03.2020?

Getting an element from a MultiIndex DataFrame tuple

I am studying the question whether it is possible to get a specific element from the MultiIndex tuple. I have a DataFrame w ... ike this: idclient=example.index['ID'] That is, get a specific element index by the column name, not by the element slice.

How do I build a bar chart based on counting the unique values in a column?

I want to build a bar chart (Bar chart) based on counting the unique values in the column: count 3 2 3 1 3 1 3 2 5 3 4 3 3 3 ... he list. How can this b be used to make exactly a list of "meetings", so that you can build a diagram: bars = plt.bar(a, b)