Home>

I am creating web scraping using BeautifulSoup.
(* Transfer from what was originally made with selenium to Beautiful Soup)

selenium uses a webdriver to open the link,
BeautifulSoup uses parser.

The question is how to handle it with the browser hidden.
In selenium, it was done as follows (import omitted)

options = Options ()
options.add_argument ('--headless')
browser = webdriver.Chrome ('/ usr/local/bin/chromedriver', chrome_options = options)

How should we write Beautiful Soup to play the role of headless here?
We apologize for the inconvenience, but thank you.

  • Answer # 1

    BeautifulSoup is not a library where something is done on the browser in the first place.
    By giving the source information obtained by requests, urllib.request, selenium.webdriver, etc.
    It is a library that allows you to parse source information, extract arbitrary elements, and edit sources.
    The browser does not start in the first place.

    For #requests
    response = requests.get ('URL') #requests to get page information
    soup = BeautifulSoup (res.content,'html.parser') # Parse the retrieved page information source
    When using #webdriver
    html = driver.page_source # Get source of currently open page on webdriver
    soup = BeautifulSoup (html,'html.parser') # Parse the retrieved source
    #For local
    file = r'file path'
    soup = BeautifulSoup (open (file),'html.parser')