web-crawler

Scrapy cannot select a form using xpath

Hello, I'm using scrapy to make a crawler to get to pick up contest questions and etc from the site gabarite.com.br, I can ge ... _questao=5104',500,600)">Notificar erro</a></li> </ul> </li> </ul>

How to make a web crawler access pages that need authentication? [closed]

closed . This question needs to be more objective and is not currently accepting answers. ... tication of the site is simple, done via https. But I also have the option to type captcha to access the page with the files.

In which programming language does a crawler/scrapper scan DOM faster?

I developed a script in which I use PHP's Class DOMDocumentto make a crawler on a third-party site. The speed of the scri ... d like to know in which programming language a script for the same purpose will bring me a DOM scan result with more speed?

Creating a php CRAWLER [closed]

closed . This question needs details or to be clearer and is not currently accepting answers. ... ta and images from some sites. I searched a lot but so far I did not find anything very detailed! I appreciate the answers

Scrapy for login

I took this code from the internet and changed it a bit, to log in to the cpfl website, but when I use the command scrapt cra ... 'Action':'1', }, callback=self.after_login) def after_login(self, response): pass

Tweet Crawler

I am using the API provided by tweeter alongside python to fetch certain tweets. The problem is that I want to view the tw ... h through all tweets pulled for tweet in results: # printing the text stored inside the tweet object print(tweet.text)

Does Content on Carousel harm SEO? Is the content of the Carousel that is hidden indexed?

I am having a doubt regarding the carousel and how its contents are indexed or not by the search crawlers . First of all, ... full contents of a carousel or just the first slide? from an SEO point of view is it worth using this kind of "component"?

Problem collecting links from a website

Dear, Good Morning! I am writing a program in Python to collect the links of a website. The part of the code to which the lin ... ) (Driver info: chromedriver=2.42.591088 (7b2b2dca23cca0862f674758c9a3933e685c27d5),platform=Windows NT 10.0.17134 x86_64)