How and what is the best way to parse a website with a SPA on python?
There is such a site I want to parse the data about this player using this link
I wrote this code for this purpose:
import requests
from bs4 import BeautifulSoup
def get_html():
r = requests.get(url='https://www.atptour.com/en/players/felix-auger-aliassime/ag37/overview')
return r.text
html = get_html()
def get_career(html):
soup = BeautifulSoup(html, 'lxml')
career = soup.find('tr')
print(career)
get_career(html)
But here's the thing,the link that I'm parsing is a single-page application
and, accordingly, the data that I need
don't come in the full html code of the page.
How can I and what is the best way to parse sites with SPA?
0
Author: Дух сообщества, 2019-12-28
1 answers
from selenium import webdriver
chromedriver = 'C:\\Program Files (x86)\\chromedrv\\chromedriver.exe' # путь к драйверу может быть любым
opts = webdriver.ChromeOptions()
opts.add_argument('headless')
browser = webdriver.Chrome(options=opts, executable_path=chromedriver)
# browser.implicitly_wait(20)
browser.get('https://www.atptour.com/en/players/felix-auger-aliassime/ag37/player-stats')
mtlist = browser.find_elements_by_class_name('mega-table')
for mt in mtlist:
print(mt.text + '\n')
A variant of parsing using the Selenium library and the Chrome browser.
2
Author: anshap, 2019-12-29 07:24:23