use translate.google.ru from your Node.JS applications
After reading a series of articles on Habr " Web scraping with Node.js"
made a scraping of the site ferra.ru
, and wanted to make a translator application for the experiment. The idea is this, I enter the word in ENG, the application in node.js forwards it to translate.google.ru, I output the result in the console-a word in Russian. Query string (I translate the word Speed into Russian).
First I checked in the browser, I do F12 on the page, yeah, there is <span class="tlid-translation ..">
it has <span class="title ..">
nested in it, which is the actual translation. At first, I was inspired, saying that now I'm parsing the received page on cheerio and the translator is ready ; -) naive ; -)
In fact, my request returns only an empty start page of the translator translate.google.ru even without input fields.
Also in the browser, I looked at what happens when I enter links
- actually loaded translate.google.ru (and it returns a cookie)
- next numerous scripts and resources
- somewhere in the middle of loading, a GET request with my word "speed" flies by: https://translate.google.ru/translate_a/single?client=webapp&sl=en&tl=ru&hl=ru&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&source=bh&ssel=0&tsel=0&kc=1&tk=747203.915042&q=speed
- In the response that returns an array that contains both the initial word and its translation: [[["speed", "Speed", null, null, 0] - catching this response would be possible will stop
Of course, I will think about how to solve the problem myself, but I would like to get instructions from the community in which direction to think (last of all, use the Yandex API for yandex. translate; -))
Sincerely, Gerasim.
1 answers
After messing around with HTTP requests, I ran into the unsolvable topic of counting HASH . So I followed the path of browser emulation, using Nightmare Here is a draft source code, from the command line I take one English word to translate into Russian, the Electron browser starts, on the resulting page I find elements containing the translation, extract the translation from them, show the answer in the console.
const Nightmare = require('nightmare')
const nightmare = Nightmare({ show: false })
let engWord = process.argv[2];//беру слово для перевода
//формирую строку запроса (если потребуется перевести несколько слов, то в запросе они разделены пробелами)
var URL=`https://translate.google.ru/#view=home&op=translate&sl=en&tl=ru&text=${engWord}`;
console.log(URL);//контроль полученного пути
/*это пример какие элементы я буду искать на странице
<span class="tlid-translation translation">
<span title="">скорость</span>
ответ содержится в span`aх вложенных в span с class="tlid-translation
*/
nightmare
.goto(URL)
.evaluate(() => {
/*выбираю span`ы вложенные в span c классом tlid-translation*/
let elements = document.querySelector('.tlid-translation').childNodes
let i = elements.length
let a = []
while (i--) {
/*собираю ответы в массив.. в обратном порядке ;-)*/
a.push(elements[i].innerHTML)
}
return a
})
.end()
.then(result=> {
console.log(result)//перевод
})
.catch(error => {
console.error('Search failed:', error)
})