Home>

I have a question about using iloc for a DataFrame in Python.

iloc recognizes that it can be used to extract data by row and column number.
But why can we extract data other than Age elements with the following code?

csv_titanic.iloc [:, csv_titanic.columns! = "Age"]

Also, does the code that extracts the "Age" element give an error as shown below?

csv_titanic.iloc [:, csv_titanic.columns = "Age"]

Thanks for your response.

  • Answer # 1

    You can select rows and columns using a Boolean array mask. It is a function that is also in numpy (should that be the original)?

    Please refer to the document of the link below for a detailed explanation. Or if you search for "pandas boolean selection", an explanation article will appear.

      

    Another common operation is the use of boolean vectors to filter the data.
      Indexing and selecting data — pandas 1.0.3 documentation

      

    csv_titanic.columns = "Age"  

    Python's equality comparison operator is==is.=Cannot be used in comparisons.

  • Answer # 2

    The explanation of why the data can be extracted or not in the question code is as answered by hayataka2049.
    So my answer is supplementary.  

    pandas.Index In classget_loc () There is a method to get the corresponding Index value from the Index name (Column name), so there is also a method to use this when specifying a column with iloc as follows.

    csv_titanic.iloc [:, csv_titanic.columns.get_loc ("Age")]

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.get_loc.html

    When filtering data, if you specify a column with a Boolean array, the result isDataFrameAlthough it is obtained by type, in this case, it will be specified by numerical value ('Index value'), so the result isSeriesType.

    import pandas as pd
    import numpy as np
    # Generate dummy data
    df = pd.DataFrame (np.arange (9) .reshape ((3,3)), columns = ['A', 'B', 'C'])
    # A B C
    # 0 0 1 2
    # 1 3 4 5
    # 2 6 7 8
    # Boolean array is obtained by comparison operation as below
    print (df.columns == 'B')
    # [False True False]
    # When passing an array of Boolean values ​​as the specified value of the column
    # (Result is obtained in DataFrame)
    print (df.iloc [:, df.columns == 'B'])
    # B
    # 0 1
    # 14
    # 2 7
    # How to get Index value corresponding to column name
    print (df.columns.get_loc ('B'))
    # 1
    # Filter by passing index value
    # (The result is obtained in Series)
    print (df.iloc [:, df.columns.get_loc ('B')])
    # 0 1
    # 14
    # 2 7
    # Name: B, dtype: int64

Related articles