In which programming language does a crawler/scrapper scan DOM faster?

I developed a script in which I use PHP's Class DOMDocumentto make a crawler on a third-party site.

The speed of the script does not meet the expected goal, I would like to know in which programming language a script for the same purpose will bring me a DOM scan result with more speed?

Author: Maniero, 2017-11-23

1 answers

Programming languages do not have speed as a characteristic. Some have features that help you have more speed. Libraries can already have speed, but the default does not need to be used. If the standard does not meet the performance requirements, rare, quite rare, then look for another library.

What gives the most speed is to use the right data structure and the right algorithm. The difference between the right choice and the wrong one may be that it takes less than 1 second to make or take centuries. There are cases that are in this proportion, and there are not a few.

Choosing a faster language can make something that takes 1 minute take less than 1 second, no more than that, and in few cases it makes that much difference. And we are talking about languages with glaring differences, for example one of the worst implementations of Ruby comparing to very well written Assembly.

Assembly is the language that allows the best possible performance. But in practice today it is so difficult write a correct and fast code in Assembly that almost always a written in C will be faster. In some cases in C++, or Rust, or Fortran may be better. But in Delphi, Java and C#, just to name a few, most of the tasks will be performed with minimal difference for these languages and even those that they are bad the difference is to take a 3 seconds where in C it would take less than 1 (almost all the difference is much lower, very much, enough to be almost laughable).

If you want keeping in languages of script so JavaScript (who knows Typescript) and Lua, especially in dialect LuaJIT, should be the best options.

PHP doesn't perform as badly, especially in newer versions.

But if you do not master the language, programming and concepts described above well, the result will not be good.

Most applications do not need as much performance as people think, those who need it often require hard and complex engineering work. So if it's possible to have a big performance gain by changing something it's because the original was very wrong (but working, which makes people think it was right).

If you do it right it is likely that the bottleneck is to bring the information over the network, even in "slow languages".

You can see a comparison of the languages . But pay attention that this is called "game" is not a scientific method. Using this to make important decisions can break your face.

 3
Author: Maniero, 2020-07-01 18:36:44