BeautifulSoup | Text parsing

Question

BeautifulSoup | Text parsing

import urllib.request
from bs4 import BeautifulSoup

def get_html(url):
    response = urllib.request.urlopen(url)
    return response.read()


def parse(html):
    soup = BeautifulSoup(html)
    projects = []
    table = soup.find('div', class_= 'wrap')
    rows = table.find_all('div', class_='listing_wr')
    for i in rows:
        projects.append({
            'title' : i.a.text,
            'Лет': i.super
        })
    for i in projects:
        print(i)


def main():
    parse(get_html('http://kakoysegodnyaprazdnik.ru/'))

if __name__ == '__main__':
    main()

In general, I need to get all the names and years of the holidays from the site, the text I received. But only 3, and I can't get any years at all please help (

0

парсер requests beautiful-soup python-3.6 urllib

Author: HedgeHog, 2018-09-29

Source

1 answers

score 1 · Accepted Answer

It's a little rough, but I think it does what you need.:

import urllib.request
from bs4 import BeautifulSoup

def get_html(url):
    response = urllib.request.urlopen(url)
    return response.read()


def parse(html):
    soup = BeautifulSoup(html, "lxml")
    projects = []
    rows = soup.find_all('div', class_='main')
    for row in rows:
        spans = row.find_all('span')
        spans_year = row.find_all('span', class_='super')
        print(spans[0].text, spans_year[0].text if spans_year else "")


def main():
    parse(get_html('http://kakoysegodnyaprazdnik.ru/'))

if __name__ == '__main__':
    main()

The bottom line is that I search for all div with the name of the class and then choose span with the name of the holiday - it seems to always be the first and then I look for span with class="super", which corresponds to the years. And if this span is there, then display it on the screen.