Home>

As an example, if there is a CSV like the following, is there a way to extract only a specific column to JSON using Python?
Shipment date, product name, description
20190201, Apples, Aomori Prefecture
20190201, apple, price 200 yen
20190201, Apple, farmer Tanaka-san

Although CSV can be converted to JSON, when the description is divided into multiple lines with the same product name, it is difficult to understand how to convert this into a single line of JSON data.

Here is the code that simply sets the CSV content to JSON. I am not sure how to change the above contents from here.
import csv
import json

with open ("../ testdata/test.csv") as f:
reader = csv.DictReader (f, delimiter = ",", quotechar = '"')

with open ('testData.json', 'w') as f:
for row in reader:
json.dump (row, f, ensure_ascii = False, indent = 1, encoding = 'utf-8')
f.write (",")

Environment
python3.7.1
anaconda

JSON I want to output

{
"Shipping Date": "20190201",
"Product Name": "Apple",
"Description": "Aomori Prefecture Price 200 yen Produced by Mr. Tanaka"
}

  • Answer # 1

    Using pandas, you can read CSV files, format data, and write JSON files

    import pandas as pd
    import io
    data = "" "
    Shipment date, product name, description
    20190201, Apple, Aomori Prefecture
    20190201, apple, price 200 yen
    20190201, Apple, production farmer Tanaka
    "" "
    df = pd.read_csv (io.StringIO (data))
    df = df.groupby (['ship date', 'product name']) ['description']. apply (lambda d: d.values) .reset_index ()
    df.to_json ('testData.json', force_ascii = False, orient = 'records', lines = True)

    Output file (testData.json)

    {"Shipping date": 20190201, "Product name": "Apple", "Description": ["Aomori Prefecture", "Price 200 yen", " The farmer is Tanaka-san "]}

  • Answer # 2

    import pandas as pd
    import io
    text = "" "Shipping date, product name, description
    20190201, Apple, Aomori Prefecture
    20190201, apple, price 200 yen
    20190201, Apple, production farmer Tanaka-san "" "
    df = pd.read_csv (io.StringIO (text))
    g = df.groupby (['Shipping date', 'Product name'])
    data = {}
    for k in g.groups:
        print (type (k))
        data ['ship date'] = k [0]
        data ['product name'] = k [1]
        data ['Description'] = '' .join (g.get_group (k) ['Description']. values.tolist ())
    print (data)
    # {'Shipping Date': 20190201, 'Product Name': 'Apple', 'Description': 'Aomori Prefecture Price 200 Yen Produced by Mr. Tanaka'}