BeautifulSoup | Text parsing
import urllib.request
from bs4 import BeautifulSoup
def get_html(url):
response = urllib.request.urlopen(url)
return response.read()
def parse(html):
soup = BeautifulSoup(html)
projects = []
table = soup.find('div', class_= 'wrap')
rows = table.find_all('div', class_='listing_wr')
for i in rows:
projects.append({
'title' : i.a.text,
'Лет': i.super
})
for i in projects:
print(i)
def main():
parse(get_html('http://kakoysegodnyaprazdnik.ru/'))
if __name__ == '__main__':
main()
In general, I need to get all the names and years of the holidays from the site, the text I received. But only 3, and I can't get any years at all please help (
0
1 answers
It's a little rough, but I think it does what you need.:
import urllib.request
from bs4 import BeautifulSoup
def get_html(url):
response = urllib.request.urlopen(url)
return response.read()
def parse(html):
soup = BeautifulSoup(html, "lxml")
projects = []
rows = soup.find_all('div', class_='main')
for row in rows:
spans = row.find_all('span')
spans_year = row.find_all('span', class_='super')
print(spans[0].text, spans_year[0].text if spans_year else "")
def main():
parse(get_html('http://kakoysegodnyaprazdnik.ru/'))
if __name__ == '__main__':
main()
The bottom line is that I search for all div
with the name of the class and then choose span
with the name of the holiday - it seems to always be the first and then I look for span
with class="super"
, which corresponds to the years. And if this span
is there, then display it on the screen.
1
Author: Axenow, 2018-09-29 12:43:17