Concurrency, or parallel computing, allows for multiple operations to be performed simultaneously. To make it seem that programs are running concurrently, the CPU rapidly switches between each program, doing some work for each process before switching to the next.
Parallel computing is used to increase the speed of processors, but due to physical heat and energy limitations, CPU manufacturers have turned to multiple processors (multicore) to achieve more processing power.
Parallelism in Python
In threading, multiple "threads" of execution exist within a single interpreter. Each thread executes code independently from the others, though they share the same data.
>>> import threading >>> def thread_hello(): other = threading.Thread(target=thread_say_hello, args=()) other.start() thread_say_hello() >>> def thread_say_hello(): print('hello from', threading.current_thread().name) >>> thread_hello() hello from Thread-1 hello from MainThread
Multiprocessing allows a program to spawn multiple interpreters, or processes, each of which runs independently without sharing data. Shared data must be communicated explicitly between processes.
>>> import multiprocessing >>> def process_hello(): other = multiprocessing.Process(target=process_say_hello, args=()) other.start() process_say_hello() >>> def process_say_hello(): print('hello from', multiprocessing.current_process().name) >>> process_hello() hello from MainProcess hello from Process-1
Problems can arise when multiple threads manipulate and access shared data. A race condition occurs when multiple threads access the same data and at least one mutates it. This causes unpredictable behavior. Access to shared data in the presence of mutation must be synchronized to prevent other threads from accessing the data while it is being mutated.
For example, if
x = 1 and we run the following lines in parallel in separate threads:
x = x * 2
x = x + 10
the possible final values of
x are 12, 11, 22, and 2.
Synchronized data structures
Some data structures provide synchronized operations. For example, a
Queue provides synchronized first-in, first-out access to data. The
put method adds an item to the
Queue and the
get method retrieves an item (from the end of the Queue). The methods are synchronized, so items are not lost no matter how thread operations are intertwined.
An example of a producer/consumer
from queue import Queue queue = Queue() def synchronized_consume(): while True: print('got an item:', queue.get()) queue.task_done() def synchronized_produce(): consumer = threading.Thread(target=synchronized_consume, args=()) consumer.daemon = True consumer.start() for i in range(10): queue.put(i) queue.join() synchronized_produce()
Another use of a
Queue is a parallel web crawler that searches for dead links on a website. By following every link hosted by the same site, it continually adds new URLs to a
Queue, performs a process on that item, and removes it from the
Queue afterwards. By using a synchronized
Queue, multiple threads can safely add to and remove from the data structure concurrently.
A lock ensures that only one thread at a time can hold it. Once it is acquired, no other threads may acquire it until it is released. For a lock to protect a set of variables, all code blocks that deal with those variables should be surrounded by
Another way to avoid conflicting access to shared data is to divide a program into phases, ensuring that shared data is mutated in a phase in which no other thread accesses it. Barriers divide a program into phases by requiring all threads to reach it before any of them can proceed. Code that is executed after a barrier cannot be concurrent with code executed before the barrier.
Another way is to entirely avoid concurrent access to the same data. Let computations behave independently, but give them a way to send messages to each other to coordinate.
Incorrectly used synchronization methods can result in the following pitfalls.
Neglecting to properly synchronize shared accesses.
Over-synchronizing a program means that non-conflicting operations cannot occur concurrently, reducing the program's efficiency.
Deadlock is a situation in which two or more threads or processes are stuck, waiting for each other to finish. No process can continue because it is waiting for other processes that are waiting for it to complete.