As the title says.
When I run it with the following code, it outputs None.
import urllib.request from bs4 import BeautifulSoup class Scraper: def __init __ (self, site): self.site = site def scrape (self): r = urllib.request.urlopen (self.site) html = r.read () parser = "html.parser" sp = BeautifulSoup (html, parser) for tag in sp.find_all ("a"): url = tag.get ("html") if url is None: print (url) continue if "html" in url: print ("\ n" + url) news = "http://news.google.com/" Scraper (news) .scrape ()
I usually watch it, not google news
When I tried it with this url, I was able to scrape it here.
url = tag.get ("html") → url = tag.get ("articles")
I tried it, but it was still None.
Please give me a professor.
Answer # 1
I made a mistake in messing around
tag.get ("href") was correct instead of tag.get ("html").
if "html" in url:
It didn't work. I wrote it according to the book "Self-study Programmer". .. ..
import urllib.request from bs4 import BeautifulSoup class Scraper: def __init __ (self, site): self.site = site def scrape (self): response = urllib.request.urlopen (self.site) html = response.read () soup = BeautifulSoup (html, "html.parser") for tag in soup.find_all ("a"): url = tag.get ("href") if url and "article" in url: print ("\ n" + "https://news.google.com/"+url) Scraper ('https://news.google.com/'). scrape ()
I don't know if this code can do the same thing as the aim of the book, but I was able to output a url that can be accessed properly.
Answer # 2
This is probably some reference book code, but stackoverflow often asks the same question about this code.Conclusion
The specifications on the http://news.google.com/ side have changed since some time.
It seems that the code has already become a mechanism that does not work properly with the code as it is.
The code itself is not problematic
Due to the specification change on the target page side, the code cannot be used as it is.
Reference: stackoverflow --Web scraping code does not work
- python - i want to scrape the contents of java in (web) html
- python - i want the block to disappear when the ball hits the block
- python - i want to add an element to a double list
- python 3x - i want to execute code obfuscated by pyarmor with google colaboratory
- python - i want to move images with pygame
- python google drive api 100mb or more files cannot be uploaded
- python - i want to handle webdriverchrome ()
- python - i want to know how to keep outputting to csv
- python - i want to send a list of yahoo news rankings to line
- python - i want to pass a list as an argument of glob and repeat it
- python - i want to find the mode of a pixel with a pixel value of 1 or more
- python - i want to combine two dfs
- operate google spreadsheet with python
- python - i want to avoid line breaks in the file path
- python - i want to read an image and display it
- python - i want to solve this problem
- i want to use google maps api, but i get referernotallowedmaperror
- how to scrape in python and display in a browser?
- python - i want to determine if the channel is a text channel
- python - you may need to restart the kernel to use updated packages error
- php - coincheck api authentication doesn't work
- php - i would like to introduce the coincheck api so that i can make payments with bitcoin on my ec site
- [php] i want to get account information using coincheck api
- the emulator process for avd pixel_2_api_29 was killed occurred when the android studio emulator was started, so i would like to
- python 3x - typeerror: 'method' object is not subscriptable
- i want to call a child component method from a parent in vuejs
- dart - flutter: the instance member'stars' can't be accessed in an initializer error
- xcode - pod install [!] no `podfile 'found in the project directory