Home>

We become indebted to.
I've been learning python for about a month.
Now that I've learned the basic syntax, I'm studying how to use pandas.

Currently, as a practice of pandas, I would like to process the population transition by prefecture and finally display it as a line graph using matplotlib.

The data used is that of Table No. 3 in the link below.
Link content

Currently writing the following code

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv (r "C: \ Users \ sirar \ Desktop \ Python related \ web scraping \ lesson \ 20201110 \ c03.csv", encoding = "shift-jis")
df = df [df ["age 5 years old"] ==" total number "]
df = df.loc [:, ["Prefecture name", "AD (year)", "Population (total number)"]]
df

I was able to extract it with the extra rows and columns removed, but even if I graph it with matplotlib in this state, it does not become the expected graph.
In the end, I think that column = prefecture name and index = year (or vice versa) should be set, is that correct? What should I do to do so?

  • Answer # 1

    In the end, I think that column = prefecture name and index = year (or vice versa) should be set, is that correct?

    House. It is good to group by prefecture and draw for each group.

    import pandas as pd
    import matplotlib.pyplot as plt
    df = pd.read_csv ('c03.csv', encoding = "shift-jis")
    df = df [df ["age 5 years"] == "total"]
    grp = df.groupby ('prefecture code')
    #for g, dfg in grp:
    for g, dfg in list (grp) [: 5]: # Show only the first 5 because there are too many
        label = dfg ['prefecture name']. tolist () [0] # Take the prefecture name for the time being
        plt.plot (dfg ['year (year)'], dfg ['population (total)'], label = label)
    plt.legend (prop = {"family": "MS Gothic"}, borderpad = 2)
    plt.show ()