Concurrency, or parallel computing, allows for multiple operations to be performed simultaneously. To make it seem that programs are running concurrently, the CPU rapidly switches between each program, doing work in one process before switching to the next. Parallel computing is used to increase the speed of processors, but due to physical heat and energy limitations, CPU manufacturers utilize the idea of multi-core processors to achieve more processing power.
Parallelism in Python
In threading, multiple "threads" of execution exist within a single interpreter. Each thread executes code independently from the others, though they share the same data.
>>> import threading >>> def thread_hello(): other = threading.Thread(target=thread_say_hello, args=()) other.start() thread_say_hello() >>> def thread_say_hello(): print('hello from', threading.current_thread().name) >>> thread_hello() hello from Thread-1 hello from MainThread
Multiprocessing allows a program to spawn multiple interpreters, or processes, each of which can run code independently, without sharing data so any shared state must be communicated between processes.
>>> import multiprocessing >>> def process_hello(): other = multiprocessing.Process(target=process_say_hello, args=()) other.start() process_say_hello() >>> def process_say_hello(): print('hello from', multiprocessing.current_process().name) >>> process_hello() hello from MainProcess hello from Process-1
Synchronized data structures
Queue is an example of a data structure that provides synchronized operations. The
queue module contains a
Queue class that provides synchronized first in, first out access to data. The
put method adds an item to the
Queue and the
get method retrieves an item (at the end of the Queue). The class ensures methods are synchronized, so items are not lost no matter how thread operations are intertwined.
An example of a producer/consumer
queue = Queue() def synchronized_consume(): while True: print('got an item:', queue.get()) queue.task_done() def synchronized_produce(): consumer = threading.Thread(target=synchronized_consume, args=()) consumer.daemon = True consumer.start() for i in range(10): queue.put(i) queue.join() synchronized_produce()
Another use of a
Queue is a parallel web crawler that searches for dead links on a website. By following every link hosted by the same site, it continually adds new URLs to a
Queue, performs a process on that item, and removes it from the
Queue afterwards. By using a synchronized
Queue, multiple threads can safely add to and remove from the data structure concurrently.
When a synchronized version of a particular data structure is not available, we have to provide our own synchronization. A solution to this is a lock, which allows a data structure to be acquired by at most one thread, after which no other thread may acquire it until it is released by the thread that previously acquired it.
Another way to avoid conflicting access to shared data is to divide a program into phases, ensuring that shared data is mutated in a phase in which no other thread accesses it. Barriers divide a program into phases by requiring all threads to reach it before any of them can proceed. Code that is executed after a barrier cannot be concurrent with code executed before the barrier.
Another way is to entirely avoid concurrent access to the same data. Using multiprocessing in Python rather than threading naturally results in this, since processes run in separate interpreters with their own data.
Incorrectly used synchronization methods can result in the following pitfalls.
Neglecting to properly synchronize shared accesses.
Over-synchronizing a program means that non-conflicting operations cannot occur concurrently.
Deadlock is a situation in which two or more threads or processes are stuck, waiting for each other to finish. No process can continue because it is waiting for other processes that are waiting for it to complete.