when parsing beru.ru responds with errors: '403' or 'Connection aborted'
When trying to parse the site beru.ru errors are returned:
403:
import requests
from bs4 import BeautifulSoup
import time
URL = 'https://m.beru.ru/catalog/tovary-dlia-avto-i-mototekhniki/76688/list?hid=90402&how=aprice#1-0'
class ParserBeru:
def __init__(self, url):
self.url = url
self.session = requests.Session()
self.session.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
'Accept-Language': 'ru',
}
def get_page(self):
res = self.session.get(url=self.url)
res.raise_for_status()
return res.text
def main():
Parser = ParserBeru(url=URL)
soup = BeautifulSoup(Parser.get_page(), 'lxml')
print(soup)
if __name__ == '__main__':
try:
main()
except Exception as e:
print(f'Ошибка чтения страницы. Пожалуйста подождите...\n{e}')
time.sleep(5)
And " ('Connection aborted.', RemoteDisconnected ('Remote end closed connection without response')) " when changing headers to:
self.session.headers = {
'Host':'https://m.beru.ru/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'ru,en-US;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache'
}
PS: if you parse with selenium, everything works fine. I assume that either there is a problem with cookies, or that beru sends a request, but of course request does not respond to it. if you write a couple of lines code with a solution to this problem, I would be very grateful :). if you read the answer, then writes: "Access to our service is temporarily prohibited!
It is possible that your computer is infected with malware that automatically accesses To Yandex."etc.
Zarenee thank you for your help
1 answers
Maybe I'm wrong, but maybe you should send cookies to the site. For example,
HEADERS ={'cookes'= 'erhfuiwhgfiuwerhiw4ueghfwoqghuyq4go7w4gfuier(что-то)'}
r = requests.get(url, headers=HEADERS)
I hope it helped