From CS 61A Wiki
Revision as of 14:23, 9 August 2014 by Stevencheng (Talk | contribs)

Jump to: navigation, search

Concurrency, or parallel computing, allows for multiple operations to be performed simultaneously. In the past, the speed of individual processor cores grew at an exponential rate, but due to power and thermal constraints, this increase came to an abrupt end. Since then, CPU manufacturers began to place cores in a single processor, allowing for more operations to be performed concurrently.

Parallelism in Python


In threading, multiple "threads" of execution exist within a single interpreter. Each thread executes code independently from the others, though they share the same data.

>>> import threading
>>> def thread_hello():
        other = threading.Thread(target=thread_say_hello, args=())
>>> def thread_say_hello():
        print('hello from', threading.current_thread().name)
>>> thread_hello()
hello from Thread-1
hello from MainThread


Multiprocessing allows a program to spawn multiple interpreters, or processes, each of which can run code independently, without sharing data so any shared state must be communicated between processes.

>>> import multiprocessing
>>> def process_hello():
        other = multiprocessing.Process(target=process_say_hello, args=())
>>> def process_say_hello():
        print('hello from', multiprocessing.current_process().name)
>>> process_hello()
hello from MainProcess
>>> hello from Process-1

Synchronized Data Structures

A Queue is an example of a data structure that provides synchronized operations. The queue module contains a Queue class that provides synchronized first in, first out access to data. The put method adds an item to the Queue and the get method retrieves an item (at the end of the Queue). The class ensures methods are synchronized, so items are not lost no matter how thread operations are intertwined.

An example of a producer/consumer Queue:

queue = Queue()
def synchronized_consume():
    while True:
        print('got an item:', queue.get())
def synchronized_produce():
    consumer = threading.Thread(target=synchronized_consume, args=())
    consumer.daemon = True
    for i in range(10):

Another use of a Queue is a parallel web crawler that searches for dead links on a website. By following every link hosted by the same site, it continually adds new URLs to a Queue, performs a process on that item, and removes it from the Queue afterwards. By using a synchronized Queue, multiple threads can safely add to and remove from the data structure concurrently.

Avoiding Improper Mutation of Shared Data


When a synchronized version of a particular data structure is not available, we have to provide our own synchronization. A solution to this is a lock, which allows a data structure to be acquired by at most one thread, after which no other thread may acquire it until it is released by the thread that previously acquired it.


Another way to avoid conflicting access to shared data is to divide a program into phases, ensuring that shared data is mutated in a phase in which no other thread accesses it. Barriers divide a program into phases by requiring all threads to reach it before any of them can proceed. Code that is executed after a barrier cannot be concurrent with code executed before the barrier.

Message Passing

Another way is to entirely avoid concurrent access to the same data. Using multiprocessing in Python rather than threading naturally results in this, since processes run in seperate interpreters with their own data.

Synchronization Pitfalls

Incorrectly used synchronization methods can result in the following pitfalls.


Neglecting to properly synchronize shared accesses.


Over-synchronizing a program means that non-conflicting operations cannot occur concurrently.


Deadlock is a situation in which two or more threads or processes are stuck, waiting for each other to finish. No process can continue because it is waiting for other processes that are waiting for it to complete.