Home>

There are pages that display new images when scrolling to the bottom of the page, such as when searching for images on google.
When crawling with BeautifulSoup on these pages, you will get html with no new images displayed.

What should I do if I want to crawl with a new image?

Error message
At the stage of loading the web page, there are images that are not displayed.
I want to crawling all images (* .jpg, * .png) on ​​the page.
Applicable source code
import requests
from bs4 import BeautifulSoup
import os

if __name__ == '__main__':
    URL = "Web page URL to be crawled"
    images = []
    soup = BeautifulSoup (requests.get (URL) .content, "lxml")
    print (soup)
    for link in soup.find_all ("img"): # Get img tag and store in link
        if link.get ("src"). endswith (". jpg"): # Get src tag which is .jpg in img tag
            images.append (link.get ("src")) # Store in the images list
        elif link.get ("src"). endswith (". png"): # Get src tag which is .png in img tag
            images.append (link.get ("src")) # Store in the images list
    os.mkdir (name)
    for target in images: # put in images from target
        re = requests.get (target)
        with open (name + "/" + target.split ('/') [-1], 'wb') as f: # Store in img folder
            f.write (re.content) # Write as image data with .content

I checked if it could not be loaded before performing crawling in the BeautifulSoup function or Developer tool settings.
But I couldn't find a satisfactory answer.

Thank you very much.

Supplemental information (FW/tool version etc.)

BeautifulSoup 4.7.1
Python 3.7.3

  • Answer # 1

    Since the content is added to the bottom line, the web browser executes the JavaScript and processes the additional loading of the content, so it is necessary to interpret and process the JavaScript.

    Unfortunately, BeautifulSoup cannot interpret and process JavaScript, so I think you will use Python + Selenium + Chrome.

    Obtaining infinite scrolling with Selenium is a common question here, so if you search for google, StackOverflow, etc. with keywords around "Python Selenium scrolling", you can find various similar contents. How about understanding and trying out the contents?