Concurrency

From CS 61A Wiki
Revision as of 12:49, 10 August 2014 by Stevencheng (Talk | contribs)


Jump to: navigation, search

Concurrency, or parallel computing, allows for multiple operations to be performed simultaneously. To make it seem that programs are running concurrently, the CPU rapidly switches between each program, doing work in one process before switching to the next. Parallel computing is used to increase the speed of processors, but due to physical heat and energy limitations, CPU manufacturers utilize the idea of multi-core processors to achieve more processing power.

Parallelism in Python

Threading

In threading, multiple "threads" of execution exist within a single interpreter. Each thread executes code independently from the others, though they share the same data.

>>> import threading
>>> def thread_hello():
        other = threading.Thread(target=thread_say_hello, args=())
        other.start()
        thread_say_hello()
 
>>> def thread_say_hello():
        print('hello from', threading.current_thread().name)
 
>>> thread_hello()
hello from Thread-1
hello from MainThread

Multiprocessing

Multiprocessing allows a program to spawn multiple interpreters, or processes, each of which can run code independently, without sharing data so any shared state must be communicated between processes.

>>> import multiprocessing
>>> def process_hello():
        other = multiprocessing.Process(target=process_say_hello, args=())
        other.start()
        process_say_hello()
 
>>> def process_say_hello():
        print('hello from', multiprocessing.current_process().name)
 
>>> process_hello()
hello from MainProcess
>>> hello from Process-1

Synchronized Data Structures

A Queue is an example of a data structure that provides synchronized operations. The queue module contains a Queue class that provides synchronized first in, first out access to data. The put method adds an item to the Queue and the get method retrieves an item (at the end of the Queue). The class ensures methods are synchronized, so items are not lost no matter how thread operations are intertwined.

An example of a producer/consumer Queue:

queue = Queue()
 
def synchronized_consume():
    while True:
        print('got an item:', queue.get())
        queue.task_done()
 
def synchronized_produce():
    consumer = threading.Thread(target=synchronized_consume, args=())
    consumer.daemon = True
    consumer.start()
    for i in range(10):
        queue.put(i)
    queue.join()
 
synchronized_produce()

Another use of a Queue is a parallel web crawler that searches for dead links on a website. By following every link hosted by the same site, it continually adds new URLs to a Queue, performs a process on that item, and removes it from the Queue afterwards. By using a synchronized Queue, multiple threads can safely add to and remove from the data structure concurrently.

Avoiding Improper Mutation of Shared Data

Locks

When a synchronized version of a particular data structure is not available, we have to provide our own synchronization. A solution to this is a lock, which allows a data structure to be acquired by at most one thread, after which no other thread may acquire it until it is released by the thread that previously acquired it.

Barriers

Another way to avoid conflicting access to shared data is to divide a program into phases, ensuring that shared data is mutated in a phase in which no other thread accesses it. Barriers divide a program into phases by requiring all threads to reach it before any of them can proceed. Code that is executed after a barrier cannot be concurrent with code executed before the barrier.

Message Passing

Another way is to entirely avoid concurrent access to the same data. Using multiprocessing in Python rather than threading naturally results in this, since processes run in seperate interpreters with their own data.

Synchronization Pitfalls

Incorrectly used synchronization methods can result in the following pitfalls.

Under-synchronization

Neglecting to properly synchronize shared accesses.

Over-synchronization

Over-synchronizing a program means that non-conflicting operations cannot occur concurrently.

Deadlock

Deadlock is a situation in which two or more threads or processes are stuck, waiting for each other to finish. No process can continue because it is waiting for other processes that are waiting for it to complete.

Source

Parallel Computing