How to properly parse Html tags in Python

Question

How to properly parse Html tags in Python

There is a question how to parse Html pages in python or rather here is the link to the page: https://3dtoday.ru/3d-models?page=1. It is necessary to parse this piece of code:

<div class="threedmodels_models_list__elem__title">
                                <a href="https://3dtoday.ru/3d-models/for-home/kitchen/derzhatel-filtra-rozhka-kofevarki" title="">
                                    Держатель фильтра рожка кофеварки.
                                </a>
                            </div>

I don't understand how to parse the text of the tag <a>?

0

python html

Author: JackWolf, 2020-06-23

Source

2 answers

A popular library for parsing XML and HTML. http://zetcode.com/python/beautifulsoup/

Analysis on a similar issue. https://stackoverflow.com/q/13240700/13468321

0

Author: oxog hex, 2020-06-23 07:51:20

score 0 · Accepted Answer

You should have used requests and bs4

import requests
from bs4 import BeautifulSoup

r = requests.get('https://3dtoday.ru/3d-models?page=1')
soup = BeautifulSoup(r.text, 'html.parser')

element = soup.find_all('div', class_='threedmodels_models_list__elem__str')
elem_soup = BeautifulSoup(str(element[1]), 'html.parser')
title = elem_soup.find_all('a')[2].text

print(title)

Output: Coffee maker horn filter holder.

Response to a comment, to get the headers of the 18 elements, you need to add a loop

for index in range(18):
    elem_soup = BeautifulSoup(str(element[index]), 'html.parser')
    title = elem_soup.find_all('a')[2].text
    print(title)