bs4.FeatureNotFound (error in BeaultifullSoup and parser)

I need to extract all text from an html. So I decided to take a look at BeaultiSoup, to see how I did it with him. But he began to show the text right at the beginning, here is the Code:

import requests
from bs4 import BeautifulSoup

url = 'http://servicos2.sjc.sp.gov.br/servicos/horario-e-itinerario.aspx?acao=p&opcao=1&txt='
r = requests.get(url)
print(r.text)

soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('table', class_='textosm')
print(lista)

The error it gives is

Traceback (most recent call last):
  File "C:/Users/Ariane/PycharmProjects/extracao/teste.py", line 9, in <module>
    soup = BeautifulSoup(r.text, 'lxml')
  File "C:\Users\Ariane\PycharmProjects\extracao\venv\lib\site-packages\bs4\__init__.py",
    line 196, in __init__  % ",".join(features))

bs4.FeatureNotFound: 
  Couldn't find a tree builder with the features you requested: lxml.
  Do you need to install a parser library?

I did the installation of lxml and changed it to html.parse, but the error remains the same.
can anyone give a help?

Author: Icaro Martins, 2019-03-30

3 answers

You requested the use of lxml, reading the error message it reports:

Couldn't find a tree builder with the features you requested: lxml.

Translating:

Cannot find a structure/tree constructor with the functionality you requested: lxml

If you read the documentation you will notice what these are features:

Currently are:

insert the description of the image here

Where "lxml'S XML parser" is written is also given:

External C dependency

That is, an extra lib is required, in the case of lxml:

To install use in CMD:

pip install lxml
 2
Author: Guilherme Nascimento, 2019-03-31 03:26:37

From what informs the error it seems that you do not have lxml installed, to install you can use one of the console commands below.

pip install lxml
# ou
python3 -m pip install lxml
# ^
# python que vocĂȘ usa para rodar os arquivos `.py`

Beautiful Soup 4 Documentation-installing a parser

 0
Author: Icaro Martins, 2019-03-31 03:24:56

Hello! I was having the same problem as you.

  • I use anaconda; and through CMD I installed lxml in the correct environment... Until then it was okay, because I was using cmd directly to run my algorithm. However, when I tried to use the VSCode terminal, I found this problem there that vc reported.

What I did: I went into the VSCode user settings and set the cmd as the " terminal.integrated.shell.windows". I did this because I noticed that the problem occurred when using Powershell (which is the default integrated terminal of vscode).

I don't know why Powershell gave the problem, but after I switched to cmd my algorithm ran without problems. Maybe you could check in your IDE which shell is used. (In my tests I tried bash tbm, and the same problem occurred with lxml).

I hope I helped. Thanks.

 0
Author: Filipe Jorge, 2019-08-16 02:50:56