There is a library called threading in Python and it uses threads (rather than just processes) to implement parallelism. This may be surprising news if you know about the Python’s Global Interpreter Lock, or GIL, but it actually works well for certain instances without violating the GIL. And this is all done without any overhead — simply define functions that make I/O requests and the system will handle the rest.
Global Interpreter Lock
The Global Interpreter Lock reduces the usefulness of threads in Python (more precisely CPython) by allowing only one native thread to execute at a time. This made implementing Python easier to implement in the (usually thread-unsafe) C libraries and can increase the execution speed of single-threaded programs. However, it remains controvertial because it prevents true lightweight parallelism. You can achieve parallelism, but it requires using multi-processing, which is implemented by the eponymous library multiprocessing. Instead of spinning up threads, this library uses processes, which bypasses the GIL.
It may appear that the GIL would kill Python multithreading but not quite. In general, there are two main use cases for multithreading:
- To take advantage of multiple cores on a single machine
- To take advantage of I/O latency to process other threads
In general, we cannot benefit from (1) with threading but we can benefit from (2).
MultiProcessing
The general threading
library is fairly low-level but it turns out that multiprocessing
wraps this in multiprocessing.pool.ThreadPool
, which conveniently takes on the same interface as multiprocessing.pool.Pool
.
One benefit of using threading
is that it avoids pickling. Multi-processing relies on pickling objects in memory to send to other processes. For example, if the timed
decorator did not wraps
the wrapper
function it returned, then CPython would be able to pickle our functions request_func
and selenium_func
and hence these could not be multi-processed. In contrast, the threading
library, even through multiprocessing.pool.ThreadPool
works just fine. Multiprocessing also requires more ram and startup overhead.
Analysis
We analyze the highly I/O dependent task of making 100 URL requests for random wikipedia pages. We compare:
We run each of these requests in three ways and measure the time required for each fetch:
- In serial
- In parallel in a
threading
pool with 10 threads - In parallel in a
multiprocessing
pool with 10 threads
Each request is timed and we compare the results.
Results
Firstly, the per-thread running time for requests
is obviously lower than for selenium
, as the latter requires spinning up a new process to run a PhantomJS
headless browser. It’s also interesting to notice that the individual threads (particularly selenium threads) run faster in serial than in parallel, which is the typical bandwidth vs. latency tradeoff.
In particular, selenium threads are more than twice as slow, problably because of resource contention with 10 selenium processes spinning up at once.
Likewise, all threads run roughly 4 times faster for selenium
requests and roughly 8 times faster for requests
requests when multithreaded compared with serial.
Conclusion
There was no significant performance difference between using threading
vs multiprocessing
. The performance between multithreading and multiprocessing are extremely similar and the exact performance details are likely to depend on your specific application. Threading through multiprocessing.pool.ThreadPool
is really easy, or at least as easy as using the multiprocessing.pool.Pool
interface — simply define your I/O workloads as a function and use the ThreadPool
to run them in parallel.
Improvements welcome! Please submit a pull request to our github.
Michael Li is founder and CEO at The Data Incubator. The company offers curriculum based on feedback from corporate and government partners about the technologies they are using and learning, for masters and PhDs.