I want to create a process for reading CSV files with Pandas.
Currently, you can use Django to create a file uploader and read CSV files.

However, if you try to upload a file with a different number of columns for a row, you will get a parseerror.

import pandas as pd
df = pd.read_csv ('CSV_file', header = None, low_memory = False)

The file is being read by Pandas.
Here, if you give names as an argument and give a column name, you can embed the missing part as a missing value and read it, or if you set the argument error_bad_lines = False, it will skip the strange line and read it. I understand.

On the other hand, for the CSV file to be read, the provision for the number of columns is provided, and if the number of columns is different from the provision, processing is terminated. I realized that if I gave the names argument, all files would meet the requirements.
Also, because header = None, I think that Pandas gives the column name without permission, but the column name does not read the file as if the column name was given by names. It seems.

In such a case, is there a way to maintain validation (?) with the specified number of columns and not cause parseerror?

Thank you for teaching me. Thank you.

  • Answer # 1

    I can only say that such a process is not very suitable for pandas.

    Reading table-like data will do its best without regret, but it does not help with data that is not originally table-like.

    For the time being, it will be one of the options to process the string until it is in the form of the desired table, read it with a standard csv module, etc., and then convert it to a pandas data frame.

    I think that it is not impossible to process based on the appearance position of missing values ​​in pandas ...

    After that,


    There is a provision for the number of columns, and if the number of columns is different from the provision, the processing is terminated

    If you can abandon the design

    , it is the fastest.

  • Answer # 2

    What is the purpose of validation?

    If you want to play if it is not cleared, try to except the error to finish.

    If there is no error even if it is not cleared, isn't there meaning for validation?