Scrapy for login

I took this code from the internet and changed it a bit, to log in to the cpfl website, but when I use the command scrapt crawl myproject nothing happens and the command scrapy runspider items.py gives the Error error:

No element find in https://servicosonline.cpfl.com.br/agencia-webapp/>

Can you tell me what's wrong?

import scrapy
BASE_URL = 'https://servicosonline.cpfl.com.br/agencia-webapp/#/login'
USER_NAME = 'username'
PASSWORD = 'password'
class ShareSpider(scrapy.Spider):
    name = "sharespider"
    start_urls = ['https://servicosonline.cpfl.com.br/agencia-webapp/#/login']
    def parse(self, response):
        yield scrapy.FormRequest.from_response(
            response,
            formxpath='//form[@id="panelMobile"]',
            formdata={
                'documentoEmail': USER_NAME,             
                'Password': PASSWORD,             
                'Action':'1',
            },
            callback=self.after_login)
    def after_login(self, response):
        pass
Author: nosklo, 2018-08-08

1 answers

The problem is that the user and password input form is not on the page you are loading - the page you are loading only has javascript code, and the form is assembled by that code dynamically.

Since scrapy doesn't run javascript, you can't use it that way on this site - that leaves you with two alternatives:

  • Analyze the javascript code of the page, find out what it does, and "simulate" it with python code written manually. This solution is usually more efficient but much more complex to implement.

    In the specific case of the CPFL website, it seems that when sending the login, it does via javascript AJAX a HTTP POST in https://servicosonline.cpfl.com.br/agencia-webapi/api/token with the following parameters:

    {
        'client_id': 'agencia-virtual-cpfl-web',
        'grant_type': 'password',
        'username', USER_NAME,
        'password': PASSWORD,
    }
    

    To find out this I used Firefox inspector mode (press F12) and tried to log in, then in the network tab you can see everything that the page is doing on the network.

    yield scrapy.FormRequest(
        url='https://servicosonline.cpfl.com.br/agencia-webapi/api/token',
        formdata={
            'client_id': 'agencia-virtual-cpfl-web',
            'grant_type': 'password',
            'username', USER_NAME,
            'password': PASSWORD,
        },
        callback=self.after_login,
    )
    

    This code above should probably give you log in, but the return will not be a page but something like 'OK' - you'll have to keep inspecting the page with the browser, to figure out what to do with it to get what you want - log in is just the beginning of the problem.

  • The other much simpler alternative to implement is to use selenium - it is a lib that allows you to control a browser through python, such as chrome or firefox - using it you can run javascript. But it is much less efficient because you it is running an entire browser...

I hope I put you in the right direction.

 1
Author: nosklo, 2018-08-08 16:40:32