Home>

A program was built using the Google Custom Search API to collect images.
However, the following error occurs and it does not work well. I would appreciate any advice on how to solve it.

Environment
MacOSX High Sierra 10.13.6
Python 3.6

"Error"

Traceback (most recent call last):
  File "google_api.py", line 19, in<module>
    items_json = requests.get (API_PATH, PARAMS) .json () ["items"]
KeyError: 'items'


"Source"

import requests
import shutil
API_PATH = "https://www.googleapis.com/customsearch/v1"
PARAMS = {
  "cx": "xxxx: xxxx", #Search engine ID
  "key": "xxxxxxxxx", #API key
  "q": "movie", #search word
  "searchType": "image", #search type
  "start": 1, #start index
  "num": 10 # of records retrieved per search (10 by default)
}
LOOP = 100
image_idx = 0
for x in range (LOOP):
  PARAMS.update ({'start': PARAMS ["num"] * x + 1})
  items_json = requests.get (API_PATH, PARAMS) .json () ["items"]
  for item_json in items_json:
    path = "imgs /" + str (image_idx) + ".png"
    r = requests.get (item_json ['link'], stream = True)
    if r.status_code == 200:
      with open (path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj (r.raw, f)
      image_idx + = 1
  • Answer # 1

      

    KeyError:'items'

      

    LOOP = 100
      image_idx = 0

      

    for x in range (LOOP):

    You have reached the upper limit of 100 search keywords (100) and you can no longer get results from the API.

    Please check the status code by executingraise_for_status ()in response to the return value ofrequests.get.

    Sample code. (Not tested)

    from logging import getLogger, StreamHandler, Formatter, DEBUG
    import requests
    import shutil
    LOGGER = getLogger ('custom_search_api')
    HANDLER = StreamHandler ()
    HANDLER.setLevel (DEBUG)
    HANDLER.setFormatter (Formatter ('% (message) s'))
    LOGGER.setLevel (DEBUG)
    LOGGER.addHandler (HANDLER)
    def fetch (url: str, params: dict = None):
        res = requests.get (url, params)
        res.raise_for_status ()
        return res
    API_PATH = "https://www.googleapis.com/customsearch/v1"
    start_index = 1
    PARAMS = {
      "cx": "xxxx: xxxx", #Search engine ID
      "key": "xxxxxxxxx", #API key
      "q": "movie", #search word
      "searchType": "image", #search type
      "start": start_index, #start index
      "num": 10 # of records retrieved per search (10 by default)
    }
    for _ in range (10): # 10 * 10 = 100
      res = fetch (API_PATH, params)
      LOGGER.info ('#' * 80)
      res_json = res.json ()
      for idx, items in enumerate (res_json ['items'], start = start_index):
        path = "imgs /" + str (idx) + ".png"
        download_link = items ['link']
        LOGGER.info (f'url: {download_link} ')
        r = requests.get (download_link, stream = True)
        if r.status_code == 200:
          with open (path, 'wb') as f:
            r.raw.decode_content = True
            shutil.copyfileobj (r.raw, f)
      start_index = res_json ['queries'] ['nextPage'] [0] .get ('startIndex')
      LOGGER.info (f'next: {start_index} ')
      params ['start'] = start_index

    Furthermore, there is a daily query limit of 100 for free slots.
    Reference: Summary of image collection on Yahoo, Bing and Google

    Large search engines have tight restrictions on image search.


    Since the summary of the answer is not well communicated, supplementary explanation.
    The code of the question sentence isthe code that doesn't consider num and cannot process 100 queries well.

      

    "num": 10 # of searches per search (10 by default)

    wandbox
    Please refer to the code in my answer and change it to 10 queries * 10 total: 100.


    Scraping is greatly affected by the convenience of the other site.
    If exceptions are not allowed, make the code to try ~ except.
    If you search by python retry or python retry, you will hit various things.


    1, When either of the following processing is performed on the response object of the return value of requests.get
    Whether the problem can be found before the error in the question text.

    Raise_for_status () is called with 1-a, try-except enclosed. (Respondents

    try:
            res = requests.get (url, params)
            res.raise_for_status ()
        except Exception as ex:
            LOGGER.exception (ex)
    Look at

    1-b, HTTP status code:statuscode.

    res = requests.get (url, params)
        print (res.status_code)

    2, Retry processing is performed.
    If you are usingrequests,from urllib3.util.retry import Retry
    Reference: Retrying KeyError
    3, and please put an appropriate sleep to reduce the load on the other server.