How to get Gmail source code using Python3

Question

How to get Gmail source code using Python3

I am accessing the Email using this code q I found and adapted:

import requests
from bs4 import BeautifulSoup

form_data = {'Email': '[email protected]', 'Passwd': 'senhaexemplo'}
post = "https://accounts.google.com/signin/challenge/sl/password"

def login(self):
    with requests.Session() as s:
        soup = BeautifulSoup(s.get("https://mail.google.com").text, "html.parser")
        for inp in soup.select("#gaia_loginform input[name]"):
            if inp["name"] not in form_data:
                form_data[inp["name"]] = inp["value"]
        s.post(post, form_data)
        html = s.get("https://mail.google.com/mail/u/0/#inbox").text
        print(html)

My goal is to take the Emails and print on the screen, with subject and content, and I know how to do that using certain html tags... But for this I need the source code of the site, and when I'm going to look at the result of print(html) does not come with any tag, everything is compressed... Something like this:

{\"1\":\"be_35\",\"53908043\":0},{\"1\":\"be_36\",\"53908043\":0},{\"1\":\"be_30\",\"53908043\":0},{\"1\":\"be_31\",\"53908043\":0},{\"1\":\"be_169\",\"53908043\":0},{\"1\":\"su_ltz\"},{\"1\":\"ic_sspvcd\"},{\"1\":\"bu_wdtfsm\"},{\"1\":\"be_26\",\"53908043\":0},{\"1\":\"be_29\",\"53908043\":0},{\"1\":\"be_280\",\"53908043\":0},{\"1\":\"be_281\",\"53908043\":0},{\"1\":\"30\",\"53908046\":0},{\"1\":\"31\",\"53908043\":0},{\"1\":\"32\",\"53908046\":0},{\"1\":\"33\",\"53908046\":0},{\"1\":\"be_277\",\"53908043\":0},{\"1\":\"34\",\"53908045\":\"\"},{\"1\":\"be_278\",\"53908043\":0},{\"1\":\"35\",\"53908046\":0},{\"1\":\"be_275\",\"53908043\":0},{\"1\":\"be_276\",\"53908043\":0},{\"1\":\"be_273\",\"53908043\":1},{\"1\":\"38\",\"83947487\":{}},{\"1\":\"se_192\",\"53908045\":\"en,es,pt,ja,fr\"},{\"1\":\"be_274\",\"53908043\":0},{\"1\":\"39\",\"53908046\":0}

How can I get the correct source code?

0

python-3.x email request

Author: dfop02, 2018-10-01

Source

1 answers

score 1 · Accepted Answer

Not wanting to rain on your parade, but... Sites that use AJAX do not return the content in HTML, they generate the content dynamically, after loading, using Javascript. You would have to use a radically different solution, such as PhantomJS, which effectively loads all of the page's helper files and executes the Javascript code, then parses the DOM and extracts the content.