bs4.FeatureNotFound (error in BeaultifullSoup and parser)
I need to extract all text from an html. So I decided to take a look at BeaultiSoup, to see how I did it with him. But he began to show the text right at the beginning, here is the Code:
import requests
from bs4 import BeautifulSoup
url = 'http://servicos2.sjc.sp.gov.br/servicos/horario-e-itinerario.aspx?acao=p&opcao=1&txt='
r = requests.get(url)
print(r.text)
soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('table', class_='textosm')
print(lista)
The error it gives is
Traceback (most recent call last):
File "C:/Users/Ariane/PycharmProjects/extracao/teste.py", line 9, in <module>
soup = BeautifulSoup(r.text, 'lxml')
File "C:\Users\Ariane\PycharmProjects\extracao\venv\lib\site-packages\bs4\__init__.py",
line 196, in __init__ % ",".join(features))
bs4.FeatureNotFound:
Couldn't find a tree builder with the features you requested: lxml.
Do you need to install a parser library?
I did the installation of lxml
and changed it to html.parse
, but the error remains the same.
can anyone give a help?
3 answers
You requested the use of lxml
, reading the error message it reports:
Couldn't find a tree builder with the features you requested: lxml.
Translating:
Cannot find a structure/tree constructor with the functionality you requested: lxml
If you read the documentation you will notice what these are features:
Currently are:
Where "lxml'S XML parser" is written is also given:
External C dependency
That is, an extra lib is required, in the case of lxml:
To install use in CMD:
pip install lxml
From what informs the error it seems that you do not have lxml
installed, to install you can use one of the console commands below.
pip install lxml
# ou
python3 -m pip install lxml
# ^
# python que vocĂȘ usa para rodar os arquivos `.py`
Hello! I was having the same problem as you.
- I use anaconda; and through CMD I installed lxml in the correct environment... Until then it was okay, because I was using cmd directly to run my algorithm. However, when I tried to use the VSCode terminal, I found this problem there that vc reported.
What I did: I went into the VSCode user settings and set the cmd as the " terminal.integrated.shell.windows". I did this because I noticed that the problem occurred when using Powershell (which is the default integrated terminal of vscode).
I don't know why Powershell gave the problem, but after I switched to cmd my algorithm ran without problems. Maybe you could check in your IDE which shell is used. (In my tests I tried bash tbm, and the same problem occurred with lxml).
I hope I helped. Thanks.