Home>

Using atmospheric environment measurement data ,
I would like to practice data visualization.

First as pre-processing
Included in each measured value in this data (such as column name measured value (1 o'clock)),
Characters such as @ and * are treated as missing values,
I want to change the data type from object to int without changing other numbers
thinking about.
However, when I wrote the following code, it all became missing values.

The contents of the data are like this.
Measured value (2 o'clock) Measured value (3 o'clock) Measured value (4 o'clock) ... Measured value (15 o'clock) Measured value (16 o'clock) Measured value (17 o'clock) Measured value (18 o'clock) Measured value (19 o'clock) ) \
0 * 0 0 ... 0 0 0 0 0
1 * 1 1 ... 5 4 3 3 3
2 * 1 1 ... 5 4 3 3 3
3 * 45 47 ... 48 49 47 43 40
4 3 1 1 ... 3 5 2 2 2
.. ... ... ... ... ... ... ... ...
355 14 15 15 ... 16 16 @ @ @
356 0 0 1 ... 4 4 @ @ @
357 4 3 5 ... 7 9 @ @ @
358 4 3 6 ... 11 13 @ @ @
359 10 12 12 ... 11 11 @ @

Applicable source code
def process (a):
    if a == '@' or '*':
        return None
data ['Measured value 1 hour'] = data ['Measured value (1 hour)']. apply (process)
data ['Measured value 2 o'clock] = data [' Measured value (2 o'clock) ']. apply (process)
data ['Measured value at 3'] = data ['Measured value (at 3)']. apply (process)
data ['Measured value 4 o'clock] = data [' Measured value (4 o'clock) ']. apply (process)
data ['Measured value at 5'] = data ['Measured value (at 5)']. apply (process)
data ['measured value 6 o'clock] = data [' measured value (6 o'clock) ']. apply (process)
data ['measured value 7 o'] = data ['measured value (7 o'clock)]. apply (process)
data ['measured value 8 o'clock] = data [' measured value (8 o'clock) ']. apply (process)
data ['Measured value 9 o'clock] = data [' Measured value (9 o'clock) ']. apply (process)
data ['Measured value 10 o'clock] = data [' Measured value (10 o'clock) ']. apply (process)
data ['Measured value 11:00'] = data ['Measured value (11:00)']. apply (process)
data ['measured value 12 o'clock] = data [' measured value (12 o'clock) ']. apply (process)
data ['Measured value 13:00'] = data ['Measured value (13:00)']. apply (process)
data ['measured value 14:00'] = data ['measured value (14:00)']. apply (process)
data ['Measured value 15:00'] = data ['Measured value (15:00)']. apply (process)
data ['Measured value 16:00'] = data ['Measured value (16:00)']. apply (process)
data ['Measured value 17:00'] = data ['Measured value (17:00)']. apply (process)
data ['measured value 18:00'] = data ['measured value (18:00)']. apply (process)
data ['Measured value 19:00'] = data ['Measured value (19:00)']. apply (process)
data ['measured value 20 o'clock] = data [' measured value (20 o'clock) ']. apply (process)
data ['measured value 21 o'clock] = data [' measured value (21 o'clock) ']. apply (process)
data ['measured value 22:00'] = data ['measured value (22:00)']. apply (process)
data ['measured value at 23'] = data ['measured value (at 23)']. apply (process)
data ['measured value 24 hours'] = data ['measured value (24 hours)']. apply (process)

I thought that else was necessary to keep the numerical value as it is, but
I can't come up with a good return.

  • Answer # 1

    When reading with read_csv (), you should specify values ​​to be treated as missing values ​​in the na_values ​​argument.

    import pandas as pd
    df = pd.read_csv ("kankyodata48.csv", encoding = "shiftjis", na_values ​​= ["*", "@"])

Related articles