Home>

I'm trying to display the processing result in Pandas Dataframe in real time during execution.
Iterating with the For statement, the result of each time is one line of Dataframe.
Since the For statement does a lot of processing, I want to check the output result even during execution, how can I display the Dataframe in real time during execution and keep updating it?


The output image looks like this. What I want to achieve is to visualize that the output result is updated every time each line is added.

I want to do something like the Panda DataFrame version of Print ('aaa', flush = True).

pandas does not support flush, and when it is set to Print, it disappears in the form of Pandas data frame and is not displayed as expected.

AttributeError:'DataFrame' object has no attribute'flush'
Corresponding source code
At the end of processing in the For statement, the following Dataframe processing is performed. It's a mess, but it's all DataFrame variables.
#Data extraction
## I want to make a list to determine whether the retrieved HTML was valid or an error
for i in range (number of URLs):
    dfname ='df' + str (i)
    for url in URLs_list:
        try: try:
            response = requests.get (url)
            response.encoding = response.apparent_encoding
            soup = BeautifulSoup (response.text,'html.parser')
            title = []
            for i in soup.find_all ('p',):## Remove line feed code
               title.append (i.getText () [0:] .replace ('\ n',''))
            for i in soup.find_all ('td',):
                title.append (i.getText () [0:] .replace ('\ n',''))
            #Convert list to one continuous string
            mojiretu =''. join (title)
            #List search results
            result = []
            for keyword in Keywords_list:
                result.append (mojiretu.count (str (keyword)))
            dfname = pd.DataFrame (result, columns = [url])
            df [url] = dfname
        except Exception:
            result = []
            for i in Keywords_list:
                result.append (0)
            dfname = pd.DataFrame (result, columns = [url])
            df [url] = dfname
    Processed into #Pandas Dataframe
    dfset = df.set_index ('Keywords')
    reversedf = dfset.transpose ()
    reversedf = reversedf.astype (int)
    reversedf ['Total'] = reversedf.sum (axis = 1)reversedf_ascend = reversedf.sort_values ​​('Total', ascending = False)
    display (reversedf_ascend)

As mentioned above, it was abandoned because it was not output in Dataframe when Flush = True of Print. Other than that, I don't know how to do it and haven't tried it.

If i do something like Print (reversedf_ascend), you will get such an output result.

Keywords Professor Assistant \
https://www.bbb 0 0
https://www.aaa 0 0

Keywords Head Total
https://www.bbb 0 0
https://www.aaa 0 0

Thank you very much.
When I tried the following method that I was taught, I felt that it was approaching, but the result was not as expected yet.
from IPython.display import display
display (df)

Supplementary information (FW/tool version, etc.)

Please provide more detailed information here.

  • Answer # 1

    By interacting with the questioner, I understood that the intention was "I want to display the display format of DataFrame in Jupyter anywhere in the code execution." I think that it is because the display is broken if it is a print statement.

    In such a case, you can get the desired display by using the display statement instead of the print statement, which is limited to Jupyter.

    Please use as follows.

    from IPython.display import display
    display (df)

    Usage result example

    Postscript

    "I want to display up to the line where the value is written as the value (other than 0 or 0) is written in order in the for statement for the data frame that already has an index and contains 0 as the initial value. I confirmed the request.

    It is difficult to apply it individually in the for statement, so we recommend the following method.

    Make it possible to distinguish whether 0 has not yet entered a value or has a value of 0. By setting the initial value to nan as a data frame, it can be distinguished that nan has not yet entered a value and 0 has entered a value of 0.

    Ignore lines that contain nan when displaying.

    In addition, due to the specifications of pandas, columns containing nan are floats, so if you want to display integers or convert integers, add astype (int) as appropriate.

    df = df.replace (0, pd.np.nan) # Convert 0 to nan assuming 0 is included
    for repeat processing
        Processing to put a value in the line of df
        display (df.dropna ()) # Lines with at least one nan are not output

    As a similar approach, it is possible to add a new "valid" column to the DataFrame, set the initial value to False, set it to True when writing data, and switch the display depending on it. However, I thought that the approach using nan was better because it was necessary to add processing when writing data and it was complicated to initialize/delete unnecessary valid columns.

  • Answer # 2

    displayHow about putting a process to erase the whole before.

    from IPython.display import display, clear_output
    for i in range (number of URLs):
        # ...
        clear_output (wait = True)
        display (reversedf_ascend)


    Of the argumentwait = TrueIt works without it, but it works smoothly with it.