Website with hidden HTML

I need to extract the sales data of semi-new cars on some sites.

One of the sites is that of the company Locamerica. However, on her website does not appear in the HTML of the page the content I need to extract.

I need to extract the data of each car present on the page, but they do not appear in the HTML. Not even external links to the car Page appear.

I downloaded the source code, ran it and appears the same site but without any car. LINK of the HTML that appears to me

I'm programming in python and I use Requests to get the HTML of the page and Beutiful Soup to do the extraction of the data I need.

Or Code

import requests as req
from bs4 import BeautifulSoup as bs

url = "https://seminovos.locamerica.com.br/seu-carro?combustivel=&cor=&q=&cambio=&combustiveis=&cores=&acessorios=&estado=0&loja=0&marca=0&modelo=0&anode=&anoate=&per_page={}&precode=0&precoate=0"
indice_pagina = 1

r = req.get(url.format(indice_pagina))
print(r.text)
Author: Rafael Ribeiro, 2018-05-19

1 answers

This happens because the page initially does not contain the information about the cars. It loads empty, and then uses JavaScript to dynamically load the data and insert it into the page.

One way around this is by using a webdriver like Selenium. Basically, you run a browser that is controlled by your Python program.

When possible, it is best to avoid this though; by running an entire browser, it loads all images and scripts and advertisements, the process is considerably slower than just using simple requests.

What You can do is open your browser's developer tools, open the Network tab, and observe the requests your browser makes while loading the page. Sometimes what loads interesting content is a simple call to a site API. In this case, you can make your request to this API.

I did this and saw some things that looked like interesting:

insert the description of the image here

The other JSON requests are not interesting; they look like filtering options and dealerships. This other one seemed a bit strange to me; it did not directly bring the information from the cars, but the strange format seemed like it could be Base64.

I copied the field veiculos and pasted into a decoder site to confirm my suspicions, and in fact, the message becomes HTML:

insert the description of the image here

As a proof of concept to get this HTML with Python:

import requests
import base64

url = 'https://seminovos.locamerica.com.br/veiculos.json?marca=&precode=&precoate=&ano_de=0&cambio=&acessorios=&current_url=https://seminovos.locamerica.com.br/seu-carro?marca=&cambio=&combustivel=&cor=&acessorios=&anode=0&precode=&precoate='

r = requests.get(url)
info = r.json()['veiculos']
info_decoded = base64.b64decode(info)

print(info_decoded)
 2
Author: Pedro von Hertwig Batista, 2018-05-20 00:35:11