Home>
import pandas as pd
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
driver = webdriver.Chrome ()
driver.get ()
time.sleep (5)
url = "https://www.ubereats.com/jp/kyoto/food-delivery/%E3%83%8F%E3%83%BC%E3%83%88%E3%83%AD%E3%83% 83% E3% 82% AF% E3% 82% AB% E3% 83% 95% E3% 82% A7-% E4% BA% AC% E9% 83% BD% E5% BA% 97-hard-rock-cafe -kyoto/U1gYCSr9QfyIVEwcA2U5cQ? pl = JTdCJTIyYWRkcmVzcyUyMiUzQSUyMiVFNCVCQSVBQyVFOSU4MyVCRCVFNSVCOCU4MiUyMiUyQyUyMnJlZmVyZW5jZSUyMiUzQSUyMkNoSUo4Y004emRhb0FXQVJQUjI3YXpZZGxzQSUyMiUyQyUyMnJlZmVyZW5jZVR5cGUlMjIlM0ElMjJnb29nbGVfcGxhY2VzJTIyJTJDJTIybGF0aXR1ZGUlMjIlM0EzNS4wMTE1NjQlMkMlMjJsb25naXR1ZGUlMjIlM0ExMzUuNzY4MTQ4OSU3RA% 3D% 3D "
driver.get (url)
time.sleep (5)
soup = BeautifulSoup (driver.page_source, "html.parser")
genre = soup.find (class_=" bw bx by eg")
genre_name = genre.string
store_address = soup.find (class_ ='b8 b9 ba as em')
address = store_address.string
score = soup.find (class _ = "bw bx by eg au aw")
scores = score.get_text ()
work_hours = soup.find (class_=" en em")
hours = work_hours.get_text ()
m = soup.find_all (class _ = "g0 g1 g2 aj")
m_list = str (len ([menu.get_text () for menu in m]))
d = soup.find_all (class_=" bw bx by fz")
d_list = str (len ([menu.get_text () for menu in d]))
Data = pd.DataFrame (
    {
        'Genre': [genre_name],
        'Address': [address],
        'Evaluation (number of evaluations)': [scores],
        'Business Hours': [hours],
        'Number of menus': [m_list],
        'Number of menus described': [d_list]
    })
Data.to_csv ('data.csv')

I made the following DataFrame with the above code.

What I don't understand is that I will add the elements of "genre, address, evaluation (number of evaluations), business hours, number of menus, number of menus with explanations" acquired from each store's page to the pandas DataFrame one by one. Is it possible, and if possible, what kind of code should I write?
As an image, it is like adding the acquired information under each item.

The code below is in the process of being written and is incomplete. If i have any other good code, you can change it at all.

for i in URL_list ():
    driver.get (i)
    time.sleep (5)
    soup = BeautifulSoup (driver.page_source, "html.parser")
    genre = soup.find (class_=" bw bx by eb")
    genre_name = genre.string
    store_address = soup.find (class_ ='b8 b9 ba as eh')
    address = store_address.string
    score = soup.find (class _ = "bw bx by eb au aw")
    scores = score.get_text ()
    work_hours = soup.find (class_=" ei eh")
    hours = work_hours.get_text ()
    m = soup.find_all (class_=" g2 g3 g4 aj")
    m_list = str (len ([menu.get_text () for menu in m]))
    d = soup.find_all (class_=" bw bx by ec")
    d_list = str (len ([menu.get_text () for menu in d]))
    l_Data = pd.DataFrame (
    {
        'Genre': [genre_name],
        'Address': [address],
        'Evaluation (number of evaluations)': [scores],
        'Business Hours': [hours],
        'Number of menus': [m_list],
        'Number of menus described': [d_list]
    })

Since all the URLs of the stores have been obtained, call the URL of each store from URL_list.driver.get ()Jump to the page of each store, get "genre, address, evaluation (number of evaluations), business hours, number of menus, number of menus described in explanation" and add it to DataFrame, and make a loop with for syntax I thought I'd go ahead with the feeling of getting information on all the stores, but I didn't know what to do from here.

Please teach.

  • Answer # 1

    I think it's okay because I saved it in the list and then converted it to DataFrame at the end.

    data = []
    for i in URL_list ():
        #Scraping
        temp = {
            "Genre": genre_name,
            "Address": address,
            "Evaluation (number of evaluations)": scores,
            "Business Hours": hours,
            "Number of menus": m_list,
            "Number of menus described": d_list,
        }
        data.append (temp)
    df = pd.DataFrame (data)

  • Answer # 2

    https://stackoverflow.com/questions/293230
    My answer in is an example of adding more and more scraped lists to a DataFrame in pandas. (Although we are also processing None, this may be omitted)

    I think it's close to the purpose. please confirm.