Home>

I'm learning scraping reviews about movies, but getting the first page is the limit, and I don't know how to get all the reviews for a single work. As an example, if you look at the following URL page, there are 182 reviews at the moment. I'd like to get all of these at once, but I don't know how to do this, so please be familiar with scraping.
https://movies.yahoo.co.jp/movie/%E3%82%AA%E3%83%BC%E3%82%B7%E3%83%A3%E3%83%B3%E3%82%BA8/363392/review/

import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup (requests.get (url = URL) .content, "lxml")
links = soup.find_all ("a", class _ = "listview__element--right-icon")
review_urls = []
for link in links:
   review_urls.append (f "{BASE_URL} {link.get ('href')}")
results = []
for review_url in review_urls:
   soup = BeautifulSoup (requests.get (url = review_url) .content, "lxml")
   results.append (soup.find ("p", class _ = "text-small text-break text-readable p1em"). text.strip ())
[print (result) for result in results]
  • Answer # 1

    Terms of Service, Volume 1 Basic Guidelines
    I violate it.

      

    Compliance for using the service
      When using our services, the following actions (including actions that induce them and preparatory actions) are prohibited.
      (7) Acts that use the service for purposes other than the original purpose of service provision in the light of the purpose of provision

      

    Prohibition of reuse of our services
      If you use our services or the data that constitutes them beyond the purpose of providing those services, we will charge you the right to suspend those acts and the amount of profits you have earned from those acts Have the right to

    There are some other conflicts.

  • Answer # 2

    Have you seen the contents of review_urls?

  • Answer # 3

    The pages are different, so you can't get them all at once by scraping.
    You need to scrape the URL of each page individually.