Home>

BeautifulSoup extracts img tags including xxxxxx as follows. driver is selenium.

html = driver.page_source.encode ('utf-8')
soup = BeautifulSoup (html, "html.parser")
elems = soup.find_all ('img', alt = re.compile ('xxxxxx'))

However, there are tags containing xxxxxx in extra places other than the table to look for, so I want to narrow down the search range.

So, is it possible to extract a certain selector in the page or an img tag containing xxxxxx contained in X-Path?
I wish I could extract the contents of the selector or X-Path once and make it feel like find_all in it.

  • Answer # 1

    from bs4 import BeautifulSoup
    html ='''
    <body>
    <p>DUMMY1</p>
      
      <p>TARGET1</p>
      <p>TARGET2</p>
      
    <p>DUMMY2</p>
    </body>
    '''
    soup = BeautifulSoup (html, "html.parser")
    wrapper = soup.find ('div', id ='wrapper')
    target = wrapper.find_all ('p', class_ ='target')
    print (target)
    # [<p>TARGET1</p>,<p>TARGET2</p>]


    How about something like this?

  • Answer # 2

    I wish I could extract the contents of the selector or X-Path once and make it feel like find_all in it.

    You can do exactly as you wrote.
    You want to identify an element first, and then search only for its offspring elements, right?
    Just write it as it is.

  • Answer # 3

    After all, it worked well by specifying the selector in soup.select.
    About the selector

    soup.select ('body>table>tbody>tr>td>table: nth-of-type (2)>tbody>tr>td>table: nth-of-type (3)>tbody>tr: nth-of -type (1)>td')

    I used to replace nth-child in the selector copied by Copy Selector from chrome with nth-of-type to specify the selector of Beautiful soup, but this did not work. It seemed that something different from what was expected was selected.

    body>table>tbody>tr>td>table: nth-of-type (2) Omitted below
    If you change the table: nth-of-type (2) number (2) to (1), the expected location will be selected.
    In other words, you can't just replace nth-child (2) in the selector copied from chrome with Copy Selector with nth-of-type (2).
    The structure itself is taken properly, but I wonder if the number of nth-child may be off.
    It's inconvenient to have to verify the number of the nth-of-type each time ...
    When I was looking for the nth number included in the target selector of chrome and the law of the nth number of the selector specified by beautirulsoup, I arrived at the following page.

    https://lmn-blog.com/nth-of-type01/
    It's also a matter of posting people's pages, but I understood from the explanation here. In table: nth-child (x) and table: nth-of-type (x), child specifies the number of the element that contains all the elements other than the table in the same row as the table element under the parent structure. , Type specifies the number of only the table element.
    The numbers shift here.
    If Beautifulsoup supports nth-child, or if the copy of chrome supports nth-of-type, you can select the selector you specified at once, but unless it is verified, or it is in the same line. If you don't count the number of elements in the structure, the expected selection may fail.
    I learned a lot.

Related articles