How to read a specific amount of data or lines in Python?
I have a file .lis
, .txt
or .csv
and I need to take from this only a quantity of data or lines and omit the other data, that I only take the data that is between those lines or desired words; or rather how do I identify the word or line and that from this show me the lines or data to another word or line where it will end?
So far I have only been able to read the file with this code:
abrir = open('clase1.lis','r')
while True:
linea = abrir.readline()
if not linea: break
print linea
Another way that there was tried was:
abrir = open('clase1.lis','r')
for q in abrir:
print q
And others other than what they do is show me the whole file or print me all the data on the screen. But as I said above I only need a block of that file. The file is very large.
4 answers
If the file is large, you should read it line by line instead of loading the entire file into memory. For example with the following archivo.txt
:
--------------------------
Hola me llamo Cesar
Soy de Lima
Me gusta Python
--------------------------
Hola me llamo Juan
Yo no soy de Lima
Odio Python
--------------------------
Hola me llamo Jose
Vivo cerca a Lima
Nunca he usado Python
--------------------------
And looking for the keyword Lima
, you can get all the lines where that condition is met:
palabra = 'Lima'
ocurrencias = []
with open('archivo.txt') as lineas:
for linea in lineas:
if palabra in linea:
ocurrencias.append(linea)
print ocurrencias
Or something more compact using filter
:
palabra = 'Lima'
ocurrencias = filter(lambda line: palabra in line, open('archivo.txt').readlines())
print ocurrencias
For both cases the result will be a list with the lines found:
['Soy de Lima\n', 'Yo no soy de Lima\n', 'Vivo cerca a Lima\n']
Let's try a little trick: every file
object behaves like a iterator, with which you can loop the file line by line. To get the text between two lines (n,m)
you can use the iterator utilities of the module itertools
:
import itertools
with open("datos.txt") as data:
texto = itertools.islice(data, n, m)
for linea in texto:
....
If you are looking for occurrences of palabra
in some lines:
import itertools
with open("datos.txt") as data:
ocurrencias = (linea for linea in data if palabra in linea)
for linea in ocurrencias:
....
Even combine both:
import itertools
with open("datos.txt") as data:
texto = itertools.islice(data, n, m)
ocurrencias = (linea for linea in texto if palabra in linea)
for linea in ocurrencias:
....
Assuming you have in your file .csv
with content:
Irlanda,33°02'N,128°12'W
Rumania,33°03'N,128°25'W
Colombia,12°43'46?N,54°02'11?W
Los Angeles,34°03'N,118°15'W
Panama,40°42'46?N,74°00'21?W
Paris,48°51'24?N,2°21'03?E
Munchen,42°53'24?N,22°21'33?E
Mexico,30°42'36?N,44°00'21?W
Paris,48°51'24?N,2°21'03?E
Colombia,32°42'36?N,34°04'21?W
You can create a function to extract the records with the content you want
lista = [];
def buscaPalabra(str, file):
for line in file:
for part in line.split():
if str in part:
lista.append(line);
return lista
For example when searching for "Colombia"
file = open('C:\Data\datos.csv','r')
print buscaPalabra("Colombia", file)
You would get the matches of "Colombia":
['Colombia,12°43'46?N,54°02'11?W \n', 'Colombia,32°42'36?N,34°04'21?W \n']
Query, the result I get when searching a txt is as follows:
['Usuario: carlos.lopez\r\n', 'gital<br><br>Usuario: carlos.carus<br><br>BP: 1378704 <br><br>CUIL: 2025201=\r\n']
What I would need is that in this case it stays in a carlos variable.lopez in another variable 1378704 and in another the cuil, can you help me with this?
The code is as follows:
lista = [];
file = open('archivo.txt','r')
def buscaPalabra(str, file):
for line in file:
for part in line.split():
if str in part:
lista.append(line);
return lista
print buscaPalabra("Usuario:", file)