Home>

Yahoo! Auctions using Python! I want to get the shipping cost.
Since I have just started learning programming recently, please forgive me if there is something wrong with the words.

[HTML]

"¥ 2,980"
(tax included)

This is a part of HTML including the shipping cost of the target page.
Page URL:

I wanted to scrape the shipping cost using the CSS selector, but it's missing.

Error message
[Execution result]
IndexError: list index out of range
Applicable source code
import requests
import lxml.html
response = requests.get (URL)
HTML = lxml.html.fromstring (response.text)
SOURYOU = HTML.cssselect ('# method0>div') [0] .text.strip ()
print (SOURYOU)

Since the prompt decision price "4,350 yen" in the same page was stored in almost the same structure, the following code was executed as a test, and "4,350 yen" was obtained.

[HTML including prompt decision price]


Prompt decision price

<\ dt>

"
4,350 yen "
(0 yen tax)<\ span>
<\ dd>

import requests
import lxml.html
response = requests.get (URL)
HTML = lxml.html.fromstring (response.text)
KAKAKU = HTML.cssselect ('# l-sub>div.ProductInformation>ul>li.ProductInformation__item.js-stickyNavigation-start>div>dl>dd.Price__value') [0] .text.strip ()
print (KAKAKU)

Since the structure is almost the same for the shipping cost and the prompt decision price, it was impossible to understand why the prompt decision price of "4,350 yen" could be obtained and only the shipping cost of "2,980 yen" could not be obtained.

Thanks for your support.

Supplemental information (FW/tool version etc.)

Python 3.8.0

Microsoft Windows 10
Version 1903

Please provide more information here.

  • Answer # 1

    Because it is the text of the div element

    import lxml.html
    HTML = lxml.html.fromstring ("" "
    "1,280 yen"
    (tax included)
    "" ")
    y = HTML.cssselect ('# method0>div') [0] .text.strip ()


    OK.

    >>>import lxml.html
    >>>HTML = lxml.html.fromstring ("" "
    ...
    ... 
    ...
    ... "1,280 yen"
    ...(tax included)
    ...  
    ... "" ")
    >>>y = HTML.cssselect ('# method0>div') [0] .text.strip ()
    >>>print (y)
    "1,280 yen"
    >>>print (y [1: -1])
    ¥ 1,280

  • Answer # 2

    Click on the "Details" section on the page → The shipping cost will be displayed
    It seems that it cannot be obtained because of the Javascript mechanism.

    It seems to be possible to get the HTML that appeared after running Googlescript by running Google Chrome in Python using a library called Selenium.

    Reference blog:
    Python Web Scraping Techniques Collection "No value can not be obtained" JavaScript support
    [https://qiita.com/Azunyan1111/items/b161b998790b1db2ff7a]

Related articles