Python's multithreading basically means that only one thread can be run in one process by GIL, so even if you execute a program with 10 threads, if you execute one thread at high speed and execute it in parallel recognizing.
In such an operation, there is no point in multithreading processing that uses a lot of CPU computing resources, and it seems that only processing that has time to sleep, such as IO processing, can be expected to speed up.
The question is, can we expect speedup by multithreading multiple file IO using Python's built-in function open as shown below?
from concurrent.futures import ThreadPoolExecutor
def read_file (file_name):
f = open (filename, "r")
text = f.read ()
f.close ()
return text
file_list = [str (i) + ".txt" for i in range (100)]
with ThreadPoolExecutor (max_workers = 100) as executor:
results = executor.map (read_file, file_list)
results = list (results)
Is there time to sleep so that the Python built-in function open is accelerated by multithreading?
Also, the above example is the process of reading the contents of a file, but is it the same for the process of writing the contents to a file?
I would appreciate it if you could tell me.
-
Answer # 1
-
Answer # 2
Is it possible to expect speedup by processing multiple file IOs in multiple threads?
Is the meaning of "speeding up" faster than when processing sequentially? So the answer is yes.
For exampleRead file (disk I/O dependent processing) takes 1 second
Processing (CPU-dependent processing) takes 1 second
Write the result to the network (network I/O dependent processing) takes 1 second
Doing this sequentially for 10 files would take 30 seconds, but multithreading parallelizes the consumption of different resources, so the overall processing time is hopefully 1/3. It will take about 10 seconds. (Strictly speaking, there are problems such as overhead and conflicting the same resources at first, so this is not the case)
On the other hand, if the program has a single resource as a bottleneck (for example, when most of the processing time is disk I/O), resource consumption cannot be parallelized, so speedup cannot be expected. Let's do it.
But I think the answer is yes, assuming you are asked in general terms.If the meaning of "acceleration" is that disk I/O is parallelized and accelerated? In that sense, the answer is no.
Is there time to sleep so that the Python built-in function open is accelerated by multithreading?
In general, I/O system calls consume resources that are not the CPU, so the CPU often waits for I/O, leaving the CPU resources free. If you call this sleep, you can say it sleeps.
The above example is the process of reading the contents of a file, but is it the same for the process of writing the contents to a file?
As mentioned earlier, I/O system calls are the same for writes and reads in the sense that they consume non-CPU resources.
-
Answer # 3
File access cannot be parallelized, so you can't expect it
Related articles
- download multiple videos using python and youtube-dl
- python - i want to fix garbled characters when using a function
- please explain the function using the python dictionary
- python - i want to change the value of the list using a function
- python - read multiple audio files (wav files)
- ruby - how to implement the edit/update function of multiple tables when using form object in rails
- python 3x - how to rename a folder created using jupyternotebook
- python 3x - i want to get the nth array with an argument using python3 argparse
- (python) i want to convert multiple json files with different columns into one csv format
- python - meaning of dlib face detection function arguments
- about batch change of file name using python
- python - image recognition using cnn keras multiple inputs
- python - i want to separate by a specific word using the split function
- about external libraries when using multiple versions of python
- processing using the len function when an integer value is obtained from python standard input
- python - combine input with built-in functions
- python 3x - processing to jump to the link destination using chrome driver in python
- i don't understand the exercises using python trigonometric functions
- python - error in image binarization using cv2adaptivethreshold function
- about data acquisition using python from becoming a novelist
- i want to change the row value using the value of the previous row in python: pandasdataframe
- python - achieve high-speed writing to dynamodb
- python - spectrogram conversion of wav files using torchaudio
- python - how to share variables with multiprocessing
- about python ctypesstructure
- about python type error
- python - changed tkinter code from threading to multiprocessing
- python - tkinter parallel processing
I think it depends on the type of disc.
In the case of HDD, even if it is multi-threaded, it cannot be physically speeded up.
It seems that NVMe compatible SSD can be expected to speed up with multithreading, but it seems that there is not much difference with sequential access.
Windows 10 generation parts selection SSD edition
Since the disk IO itself becomes a bottleneck, it will be slow if you write small data in multiple times. Basically, it is more efficient to buffer in memory and write a lot of data at once.