How to read an excel table in pandas by skipping the first lines, without losing information?

Question

How to read an excel table in pandas by skipping the first lines, without losing information?

Hello, I have a problem to solve a little complicated, I have several xls tables in which I have to withdraw some data, these tables are with their contents organized in a strange way...

TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO

Alt. 280m
Lat. 1°1'S
Lon. 4°1'W

                    DADO 1          DADO 2
    HORA UTC        0000                0100
    18-dez-2004     23,0                24,0
    19-dez-2004     24,9                24,9
    20-dez-2004     26,1                26,1
    21-dez-2004     26,6                26,1
    22-dez-2004     22,3                22,4
    23-dez-2004     25,9                26,0

This table has a large title at the top, below appear data in red that is important for my search, right then a row with all the titles of the columns in blue in which I need to read, in yellow and the time that data was collected, Green the day.

When I try to read these tables conventionally with python the columns appear embarrassingly due to the title and data in red, I wish I could read this data.

I thought about making a script to delete the rows that will not be useful to me and then transport the data in red to two separate columns at the end, I still don't know how to do this, but for some reason the first row nape and erased with my df.drop (line) from pandas read_excel.

I stuck in this problem and I do not know how to turn around, if I should clean the data or if I can treat them like this, thank you very much to anyone willing to help.

0

python excel pandas planilhas data-science

Author: Lucas, 2019-12-13

Source

1 answers

score 2 · Accepted Answer

A simple way to get around this problem is by using the read_excel function of pandas, passing the skiprows parameter with the number of rows you want to ignore before the table starts in the excel sheet:

import pandas as pd

df = pd.read_excel("file.xlsx",
                   sheet_name = "Sheet1",
                   skiprows = range(0, 10) # ignora as primeiras 10 linhas do excel
                   )