Sites with authentication-Web Scraping-Python

Question

Sites with authentication-Web Scraping-Python

BR: I am trying to automate a process of getting data via web using Python. In my case, I need to pull the information from page https://sistema.justwebtelecom.com.br/adm.php . however, before going to this page, you need to log in to https://sistema.justwebtelecom.com.br/login.php . the code below theoretically should log in to the site:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

url = 'https://sistema.justwebtelecom.com.br/adm.php'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('html')

print(lista)

BR: however, by printing the list variable, i get the source code of the page https://sistema.justwebtelecom.com.br/login.php , that is, before logging in. Being I ask for the print of the page after logging in and I have access to the panel .../adm.php .

BR: I would like to know if you have how I get this information, because when I go in network in the browser, I can have access to some file information with post method. But I can not give a print on this information.

1

python web-scraping python-requests

Author: Rafael Garcia, 2020-04-14

Source

1 answers

score 2 · Answer 1

Hello, First welcome.

I noticed some errors in your code and other things I would do differently.

The primary error in your code is that you are logging in in an automated way by selenium and right after you make an isolated request trying to access a page that requires a session. request will not take advantage of the session you opened with selenium.

Solution:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
lista = soup.find_all('html')

print(lista)

With this change you stop getting the source code by an external request and proceeds to obtain the source code by selenium itself. Since I do not have the credentials to test this solution I ask you to test.

And finally, I do not know if this code is only for testing but if it is not advisable to rewrite the code using functions and separating the skills by part because if you want to create a more extensive code then to give maintenance somewhere with the code being programmed the way it is can get difficult.