How to read an excel table in pandas by skipping the first lines, without losing information?
Hello, I have a problem to solve a little complicated, I have several xls tables in which I have to withdraw some data, these tables are with their contents organized in a strange way...
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
Alt. 280m
Lat. 1°1'S
Lon. 4°1'W
DADO 1 DADO 2
HORA UTC 0000 0100
18-dez-2004 23,0 24,0
19-dez-2004 24,9 24,9
20-dez-2004 26,1 26,1
21-dez-2004 26,6 26,1
22-dez-2004 22,3 22,4
23-dez-2004 25,9 26,0
This table has a large title at the top, below appear data in red that is important for my search, right then a row with all the titles of the columns in blue in which I need to read, in yellow and the time that data was collected, Green the day.
When I try to read these tables conventionally with python the columns appear embarrassingly due to the title and data in red, I wish I could read this data.
I thought about making a script to delete the rows that will not be useful to me and then transport the data in red to two separate columns at the end, I still don't know how to do this, but for some reason the first row nape and erased with my df.drop (line) from pandas read_excel.
I stuck in this problem and I do not know how to turn around, if I should clean the data or if I can treat them like this, thank you very much to anyone willing to help.
1 answers
A simple way to get around this problem is by using the read_excel
function of pandas
, passing the skiprows
parameter with the number of rows you want to ignore before the table starts in the excel sheet:
import pandas as pd
df = pd.read_excel("file.xlsx",
sheet_name = "Sheet1",
skiprows = range(0, 10) # ignora as primeiras 10 linhas do excel
)