I want to get the publisher name from the book name and put it together in a Google spreadsheet. I'm working at Colab.
https://www.tutorialfor.com/go.php?id=304423I tried to achieve the same thing using the API of the National Diet Library, but since it is difficult, I would like to aim for a solution by scraping.
https://qiita.com/Azunyan1111/items/b161b998790b1db2ff7aIt is quoted largely with reference to.
Corresponding source code
https://honto.jp/netstore/search.html?gnrcd=1&k=%E3%82%A8%E3%83%BC%E3%82%B9%E8%96%AC%E7%90%86%E5 % AD% A6&extSiteId = junkudo&cid = eu_hb_jtoh_0411&srchf = 1 -------------------------------------------------- ------------------------- AttributeError Traceback (most recent call last) <ipython-input-35-9e745ffb8004>in<module>() 35 36 # Display the text at the specified location using CSS selectors --->37 print (soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). Text) 38 publisher = soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). Text 39 Reflected in #sheet AttributeError:'NoneType' object has no attribute'text'
What I tried
import numpy as np from pandas import DataFrame import xml.etree.ElementTree as ET import requests from collections import defaultdict from google.colab import files import urllib.request from bs4 import BeautifulSoup ss_url = "https://docs.google.com/spreadsheets/d/ooooooo" workbook = gc.open_by_url (ss_url) worksheet = workbook.get_worksheet (1) cell_list = worksheet.range ("A3: A5") for cell in cell_list: #Search conditions title = cell.value #Convert search terms search_word = urllib.parse.quote (title) # URL to access url = "https://honto.jp/netstore/search.html?gnrcd=1&k="+ search_word + "&extSiteId = junkudo&cid = eu_hb_jtoh_0411&srchf = 1" print (url) #Access the URL In the return value, the instance containing the access result and HTML etc. will be returned. instance = urllib.request.urlopen (url) Extract HTML from #instance and parse it for beautiful Soup soup = BeautifulSoup (instance, "html.parser") Display text at the specified location using # CSS selector print (soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). text) publisher = soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). Text Reflected in #sheet worksheet.update_cell (cell.row, cell.col +1, publisher)
print (soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). text)
print (soup.select_one ("# displayOrder1>div>div.stInfo>div.stContents>ul>li: nth-child (4)>a"). String)
I did, but it doesn't change.
I checked, but it may be another problem.
I'm also looking for a way to use Selenium, but I'm not sure if it will lead to a solution.
Answer # 1
soup.select_one (～～～)The result of
That is, the specified node does not exist.
print (urllib.request.urlopen (url) .read ())Then, let's review the HTML carefully.
If you write the code based on what you see with the developer tools of the browser,
-A node in the frame
Here are some examples of common questions in the past.
- python - the generated object will be cut off in 10 seconds
- python - speech processing typeerror:'int' object is not subscriptable
- python - 'numpyndarray' object has no attribute'numpy' error
- macos (osx) - python: exporting the list to txt gives "" double quotes
- python - 'httpresponse' object has no attribute'cookies'
- python - typeerror:'list' object is not callable
- python - [opencv] i want to display the area of each object at the position adjacent to the object
- python - about unhashable, object is not callable
- python - how to hide the browser by scraping with beautifulsoup
- python - object is not displayed in html in django queryset
- python - i want to store an object in an array
- typeerror in python:'str' object is not callable
- about the arguments of python beautifulsoup find_all
- python 3x - about the amount of data for object detection using machine learning
- python - attributeerror:'series' object has no attribute'flags' error
- python - when specifying an element in beautifulsoup, can you write it by omitting the endless continuation of previouspreviousp
- python - i want to distribute video in object storage with access restrictions fw uses flask i want to use flask as a proxy, but
- python - extraction of the part of beautifulsoup scraping that is not surrounded by
- python - i can't get the value from the table by scraping with beautifulsoup
- python : selenium fails to find the specified file
- python : It does not work around Recaptcha
- How to find a Python selector element?
- python : How to parse after a certain period of time?
- python : ATTRIBUTEERROR: MODULE 'SELENIUM.WEBDRIVER' SELENIUM.WEBDRIVER 'HAS No Attribute' Firefox '
- python : I can not choose the selenium drop-down list
- python : About Append in for statement
- Python Selenium does not find an element on the page
- python : Error Selenium.comMon.Exceptions.ElementNotInteractableException Could Not Be Scrolid Into View
- python : Recursionerror: Maximum Recursion Depth Exceeded While Pickling An Object. Multi-threaded parcel