Home>
Initial question

I want to get the publisher name from the book name and put it together in a Google spreadsheet. I'm working in Colab, but you can also download it to csv and manually paste it into your spreadsheet.

It can be Python or GAS, and if it can be realized using Rakuten Books API etc., that's fine.

Books do not hit in search
The title of the book is

  • Artificial respiration management based on the pathology of Dr. Ryoma
  • Was it something like this! !! Oxygen therapy
  • Ace pharmacology
  • I wanted to know that addiction medical treatment

Etc. When I changed these to "Natsume Soseki" or "artificial intelligence", it became a hit.

Corresponding source code

(Deleted due to character limit)

What I tried

http://ailaby.com/ndl_search/
Almost the code on this site is the same.

I also tried the Rakuten Books API, but I'm having a hard time because I found only the method using ISBN by searching the internet.

2020/11/16 postscript

I wrote the code to post to the sheet reflecting what you taught me, but it still doesn't work. I would appreciate any advice.

change point
  • Get only the contents of cell
  • Create your own code to reflect on the sheet
  • The author of this codeCollected multiple data for one book and put it together in df, but in my case I collected one data for each of multiple books (publisher name + author name etc.) I wanted to put it together in a sheet, so I edited it myself.
---------------------------------------------
Search results for cnt = 1 mediatype = 1 title = I wanted to know that addiction medical treatment from = 19800101
---------------------------------------------
-------------------------------------------------- -------------------------
NameError Traceback (most recent call last)<ipython-input-7-ed4b9090a8a1>in<module>()
    109
    Reflected in 110 #sheet
->111 worksheet.update_cell (cell.row, cell.col +1, publisher.text)
    112 worksheet.update_cell (cell.row, cell.col +2, item.find ('title'). Text)
    113 worksheet.update_cell (cell.row, cell.col +3, author.text)
NameError: name'publisher' is not defined
Corresponding source code
import numpy as np
from pandas import DataFrame
import xml.etree.ElementTree as ET
import requests
from collections import defaultdict
from google.colab import files
ss_url = "https://docs.google.com/spreadsheets/d/oooooo"
workbook = gc.open_by_url (ss_url)
worksheet = workbook.get_worksheet (1)
cell_list = worksheet.range ("A4: A10")
for cell in cell_list:
  #Search conditions
  params = {}
  params ['title'] = cell.value
  params ['mediatype'] = '1'params ['from'] = '1980-01-01'
  params ['cnt'] = '1'
  params ['idx'] = '1'
  list_map = defaultdict (list)
  total = 0
  #Session
  s = requests.session ()
  while True:
      #Search request
      #XML Perth
      root = ET.fromstring (r.text.encode ('utf-8'))
      print ('---------------------------------------------')
      print (root.find ('channel'). find ('description'). Text)
      print ('---------------------------------------------')
      items = root.findall ('.// ​​item')
      for i, item in enumerate (items):
          print ('--------' + str (total + i + 1) +'---------')
          #Title
          print (item.find ('title'). text)
          list_map ['title']. append (item.find ('title'). text)
          #ID
          Extract from the text of the #link tag
          #Example
          # R100000001-I022140205-00
          #
          link = item.find ('link'). text
          print ('''+ link [link.rfind ('/') + 1:])
          list_map ['ID']. append (link [link.rfind ('/') + 1:])
          #Author
          # Various formats
          # Example
          #      · Natsume Soseki,
          # ・ Natsume Soseki,
          # ・ Natsume, Soseki, 1867-1916,
          # ・ Natsume Soseki/work,
          #
          author = item.find ('author')
          if author is not None:
              print ('' + author.text)
              list_map ['author'] .append (author.text)
          else: else:
              list_map ['author']. append ('')
          #Publish date
          #Example Fri, 23 Jun 1995 09:00:00 +0900
          pubDate = item.find ('pubDate')
          if pubDate is not None:
              print ('' + pubDate.text)
              list_map ['pubDate'] .append (pubDate.text)
          else: else:
              list_map ['pubDate']. append ('')
          #Issue year
          # Get the oldest year as there are cases where multiple sets are made
          issueds = item.findall ('{http://purl.org/dc/terms/}issued')
          lst = [issued.text for issued in issueds]if len (lst)>0:
              print ('' + lst [np.argmin (lst)])
              list_map ['issued']. append (lst [np.argmin (lst)])
          else: else:
              list_map ['issued']. append ('')
          #Series title
          #In the case of paperback books, here is XX paperback
          if seriesTitle is not None:
              print ('' + seriesTitle.text)
              list_map ['seriesTitle']. append (seriesTitle.text)
          else: else:
              list_map ['seriesTitle']. append ('')
          # the publisher    
          publisher = item.find ('{http://purl.org/dc/elements/1.1/}publisher')
          if publisher is not None:
              print ('' + publisher.text)
              list_map ['publisher']. append (publisher.text)
          else: else:
              list_map ['publisher']. append ('unknown')
      Reflected in #sheet
      worksheet.update_cell (cell.row, cell.col +1, publisher.text)
      worksheet.update_cell (cell.row, cell.col +2, item.find ('title'). Text)
      worksheet.update_cell (cell.row, cell.col +3, author.text)

      cnt = int (params ['cnt'])
      idx = int (params ['idx'])
      if len (items)<cnt:
          break

      # df = DataFrame ({'title': list_map ['title'],


      #'ID': list_map ['ID'],


      #'author': list_map ['author'],


      #'pubDate': list_map ['pubDate'],


      #'issued': list_map ['issued'],


      #'seriesTitle': list_map ['seriesTitle'],


      #'publisher': list_map ['publisher']},

      # columns = ['title','ID','author','pubDate','issued','seriesTitle','publisher'])
      # df.to_csv ("books.csv", encoding ='utf-8')
      # files.download ('books.csv')
      #df
      #
  • Answer # 1

    It seems that the params passed by get are converted to a character string, but I think it is passed by the dictionary itself.
    With this, I don't think the parameter is set in the get url.

     #Search request
       r = s.get ('http://iss.ndl.go.jp/api/opensearch', params = str (params ['title']))

    Fix
    Substitute params as it is for params.

     #Search request
    <p>If cell_list = worksheet.range ("A4: A5"), cell information is also included, so when referring to this,<br />
    See value in value.</p>
    <pre><code data-language = "Python">params ['title'] = cell.value

    The referenced variable publisher is outside the for loop.
    When the publisher's processing is inside the for loop, the indent must be matched with for.

             else: else:
                  list_map ['publisher']. append ('unknown')
               #There is no indentation, so please match it with the indentation for for
               Reflected in #sheet
               worksheet.update_cell (cell.row, cell.col +1, publisher.text)
               worksheet.update_cell (cell.row, cell.col +2, item.find ('title'). Text)
               worksheet.update_cell (cell.row, cell.col +3, author.text)

    I checked the search API.
    https://iss.ndl.go.jp/information/wp-content/uploads/2020/03/ndlsearch_api_20200302_jp.pdf
    If you specify the month in the date specification, it does not seem to hit only the year.
    Comment out the parameter'from'.

    # params ['from'] = '1980-01-01'


    Also, what does the code below mean?
    When the search is successful, len (items) will be equal to cnt.
    It will be an infinite loop.
    please check.

    cnt = int (params ['cnt'])
     idx = int (params ['idx'])
     if len (items)<cnt:
          break

  • Answer # 2

    For those who have requested information on Google Apps Script.

    https://stackoverflow.com/questions/210493 Here is an example of an isbn search.

    Based on ↑isbn =Instead oftitle = ${encodeURIComponent ('artificial intelligence')}You can search by title.
    Unlike isbn, in the case of title, multiple items can be found, so I wonder if I can loop the 0 of item [0] instead of fixing it.