BeautifulSoup-site parsing

Question

BeautifulSoup-site parsing

import requests
import csv
from bs4 import BeautifulSoup

url = 'https://www.zakon.kz/news/page/3'


def parse(url):
   news = []
   response = requests.get(url)
   html = response.text
   soup = BeautifulSoup(html, 'html.parser')
   products = soup.find_all('div', {'class': 'cat_news_item'})
   for product in products:
      title = product.find('a')
      date = product.find('span')
      news.append({
        'name': [name.text for name in product.find_all('a')]
    })
      for i in news:
         print(i)

      with open('zakon.csv', 'w') as files:
          opp = csv.writer(files)
          opp.writerows(i)


print(parse(url))

I can't understand what the problem is, why the information is not written correctly to the csv file.

0

python парсер requests beautiful-soup csv

Author: strawdog, 2019-11-19

Source

3 answers

score 0 · Answer 1

Because you were creating a dictionary for data output, I assume you wanted the output in JSON format, not CSV format.

To do this, import json:

import json

And write it like this:

with open('zakon.json', 'w', encoding="utf-8") as files:
    json.dump(news, files, ensure_ascii=False, indent=2)

File zakon.json will then look like this (start):

[
  {
    "name": []
  },
  {
    "name": [
      "Китай взял пример с США и ударил по Microsoft"
    ]
  },
  {
    "name": [
      "Крысы захватили столичный микрорайон"
    ]
  },
  {
    "name": [
      "\"Дала пощечину и зажала рот\". Няня призналась, как укладывала ребенка спать"
    ]
  },

score 0 · Answer 2

Errors here:

for i in news:
    print(i)

with open('zakon.csv', 'w') as files:
    opp = csv.writer(files)
    opp.writerows(i)

First:

After the for i in news: loop, only the last element of the list news remains in the i variable - and only this element is written to the zakon.csv file.

The second one:

opp.writerows() expects an iterable parameter (for example, list, tuple, dictionary) lines, with each line representing one line for the output file. You have given as a parameter i that - as I wrote above - is the last item in the list of dictionaries, i.e. dictionary .

The dictionaries iterate over keys, and there is only one key in your dictionary - 'name'. Since that key is a string, it is interpreted as a list of characters ['n', 'a', 'm', 'e'] and written to the output file as n,a,m,e - which is what you got, even though you didn't want to.

Correction:

with open('zakon.csv', 'w', encoding="utf-8", newline='') as files:
    opp = csv.writer(files)

    for i in news:
        print(i)
        if i['name']:                # Когда список непустой 
            opp.writerow(i['name'])  # Или какой-то другой список (ваши только 1-элементные)

The file zakon.csv will then look like this (start):

Китай взял пример с США и ударил по Microsoft
Крысы захватили столичный микрорайон
"""Дала пощечину и зажала рот"". Няня призналась, как укладывала ребенка спать"
Зеленский и Путин встретятся до пресс-конференции нормандского саммита
"Идеальное место, чтобы выкинуть ребенка - алматинцы о страшной находке"

score 0 · Answer 3

Can howl what you wanted:

import requests
import csv
from bs4 import BeautifulSoup

url = 'https://www.zakon.kz/news/page/3'


def parse(url):
    news = []
    response = requests.get(url)
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.find_all('div', {'class': 'cat_news_item'})
    for product in products:
        title = product.find('a')
        date = product.find('span')
        if title:
            news.append([date.text, title.text])

    with open('zakon.csv', 'w', encoding="utf-8", newline='') as files:
        opp = csv.writer(files)
        opp.writerows(news)

parse(url)

The file zakon.csv will then look like this (start):

23:55,"Автобусные полосы в Алматы открыли для такси, но по ним никто не ездит"
23:37,Котлован под фундамент высотки вырыли в нескольких шагах от четырехэтажки
23:02,Семьям погибших вахтовиков выплатили по миллиону
22:46,В молочном союзе рассказали о требованиях к производителям молока
22:15,Бывшие вице-министры энергетики предстанут перед судом еще до конца года
21:36,"Сотрудники ""неотложки"" спасли из горящего дома двоих детей"