Home>

Python's multithreading basically means that only one thread can be run in one process by GIL, so even if you execute a program with 10 threads, if you execute one thread at high speed and execute it in parallel recognizing.
In such an operation, there is no point in multithreading processing that uses a lot of CPU computing resources, and it seems that only processing that has time to sleep, such as IO processing, can be expected to speed up.

The question is, can we expect speedup by multithreading multiple file IO using Python's built-in function open as shown below?

from concurrent.futures import ThreadPoolExecutor
def read_file (file_name):
    f = open (filename, "r")
    text = f.read ()
    f.close ()
    return text
file_list = [str (i) + ".txt" for i in range (100)]
with ThreadPoolExecutor (max_workers = 100) as executor:
    results = executor.map (read_file, file_list)
    results = list (results)

Is there time to sleep so that the Python built-in function open is accelerated by multithreading?
Also, the above example is the process of reading the contents of a file, but is it the same for the process of writing the contents to a file?
I would appreciate it if you could tell me.

  • Answer # 1

    I think it depends on the type of disc.
    In the case of HDD, even if it is multi-threaded, it cannot be physically speeded up.
    It seems that NVMe compatible SSD can be expected to speed up with multithreading, but it seems that there is not much difference with sequential access.
    Windows 10 generation parts selection SSD edition

    Since the disk IO itself becomes a bottleneck, it will be slow if you write small data in multiple times. Basically, it is more efficient to buffer in memory and write a lot of data at once.

  • Answer # 2

    Is it possible to expect speedup by processing multiple file IOs in multiple threads?

    Is the meaning of "speeding up" faster than when processing sequentially? So the answer is yes.
    For example

    Read file (disk I/O dependent processing) takes 1 second

    Processing (CPU-dependent processing) takes 1 second

    Write the result to the network (network I/O dependent processing) takes 1 second

    Doing this sequentially for 10 files would take 30 seconds, but multithreading parallelizes the consumption of different resources, so the overall processing time is hopefully 1/3. It will take about 10 seconds. (Strictly speaking, there are problems such as overhead and conflicting the same resources at first, so this is not the case)
    On the other hand, if the program has a single resource as a bottleneck (for example, when most of the processing time is disk I/O), resource consumption cannot be parallelized, so speedup cannot be expected. Let's do it.
    But I think the answer is yes, assuming you are asked in general terms.

    If the meaning of "acceleration" is that disk I/O is parallelized and accelerated? In that sense, the answer is no.

    Is there time to sleep so that the Python built-in function open is accelerated by multithreading?

    In general, I/O system calls consume resources that are not the CPU, so the CPU often waits for I/O, leaving the CPU resources free. If you call this sleep, you can say it sleeps.

    The above example is the process of reading the contents of a file, but is it the same for the process of writing the contents to a file?

    As mentioned earlier, I/O system calls are the same for writes and reads in the sense that they consume non-CPU resources.

  • Answer # 3

    File access cannot be parallelized, so you can't expect it