use translate.google.ru from your Node.JS applications

After reading a series of articles on Habr " Web scraping with Node.js" made a scraping of the site ferra.ru, and wanted to make a translator application for the experiment. The idea is this, I enter the word in ENG, the application in node.js forwards it to translate.google.ru, I output the result in the console-a word in Russian. Query string (I translate the word Speed into Russian).

First I checked in the browser, I do F12 on the page, yeah, there is <span class="tlid-translation .."> it has <span class="title .."> nested in it, which is the actual translation. At first, I was inspired, saying that now I'm parsing the received page on cheerio and the translator is ready ; -) naive ; -)

In fact, my request returns only an empty start page of the translator translate.google.ru even without input fields.

Also in the browser, I looked at what happens when I enter links

  1. actually loaded translate.google.ru (and it returns a cookie)
  2. next numerous scripts and resources
  3. somewhere in the middle of loading, a GET request with my word "speed" flies by: https://translate.google.ru/translate_a/single?client=webapp&sl=en&tl=ru&hl=ru&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&source=bh&ssel=0&tsel=0&kc=1&tk=747203.915042&q=speed
  4. In the response that returns an array that contains both the initial word and its translation: [[["speed", "Speed", null, null, 0] - catching this response would be possible will stop

Of course, I will think about how to solve the problem myself, but I would like to get instructions from the community in which direction to think (last of all, use the Yandex API for yandex. translate; -))

Sincerely, Gerasim.

1 answers

After messing around with HTTP requests, I ran into the unsolvable topic of counting HASH . So I followed the path of browser emulation, using Nightmare Here is a draft source code, from the command line I take one English word to translate into Russian, the Electron browser starts, on the resulting page I find elements containing the translation, extract the translation from them, show the answer in the console.

const Nightmare = require('nightmare')
const nightmare = Nightmare({ show: false })

    let engWord = process.argv[2];//беру слово для перевода
    //формирую строку запроса (если потребуется перевести несколько слов, то в запросе они разделены пробелами)
    var URL=`https://translate.google.ru/#view=home&op=translate&sl=en&tl=ru&text=${engWord}`;
    console.log(URL);//контроль полученного пути
    /*это пример какие элементы я буду искать на странице
    <span class="tlid-translation translation"> 
      <span title="">скорость</span> 
    ответ содержится в span`aх вложенных в span с class="tlid-translation 
    */
            nightmare
              .goto(URL)
              .evaluate(() => {
                  /*выбираю span`ы вложенные в span c классом tlid-translation*/
                  let elements = document.querySelector('.tlid-translation').childNodes
                  let i = elements.length
                  let a = []
                  while (i--) {
                    /*собираю ответы в массив.. в обратном порядке ;-)*/
                    a.push(elements[i].innerHTML)
                  }
                  return a
                })
              .end()
              .then(result=> {
                  console.log(result)//перевод
                })
              .catch(error => {
                console.error('Search failed:', error)
              })
 0
Author: Gerasim Gerasimov, 2019-02-11 11:32:35