From CS 61A Wiki
Revision as of 10:52, 18 June 2014 by Axis (Talk | contribs)

Jump to: navigation, search
Purge this page if the LaTeX typesetting doesn't render.

A tree is an abstract model of a hierarchical structure. It consists of nodes with a parent-child relationship. Its name comes from the fact that when drawn, it resembles an upside-down real tree — the root at the top and the leaves at the bottom. A tree is a recursive data structure; it is either the empty tree or a node that contains a label and references to subtrees. The label can be any data type.

In computer science, there are many different types of trees. In 61A, we will look at general trees and binary trees.


Here is a tree:

Binary tree.png
  • node: a point in the tree. Each node has a label (value at each node). In the figure, each circle represents a node.
  • root: the node at the top. Every tree has one root. In the figure, node F is the root.
  • parent: the node that is directly above a node. In the figure, node D is the parent of nodes C and E.
  • children: the nodes directly beneath a node. In the figure, nodes C and E are the children of node D.
  • inner node: a node with children. In the figure, nodes B, D, F, G, and I are inner nodes.
  • leaf: a node with no children. In the figure, nodes A, C, E, and H are leaves.
  • subtree: a node and all its descendants. In the figure, the tree rooted at node B is a subtree of node F.
  • depth: how far away a node is from the root. The root of a tree has depth 0. In the figure, node D has depth 2.
  • height: the depth of the lowest leaf. In other words, the length of the longest path from the root to a leaf. An empty tree has height 0. In the figure, the height is 3.


Here are some applications of trees:

  • hierarchical organization of real-world data
    • family trees, taxonomies
    • filesystem tree – used by operating systems to maintain the filesystem. A directory is the parent of the enclosed files/directories. Here is a UNIX filesystem tree:
Filesystem tree.png
  • data structures for efficiently storing and retrieving data
    • binary search tree – store objects in a sorted structure that allows for fast searching
  • decision trees – diagram a branching algorithmic process. Each node is a question, each branch is an answer to a question, and each leaf is an outcome. Here is an example:
Decision tree.png
  • parse tree – shows the structure of source code. Here is the parse tree for (2 + 6) / (9 % 5):
Expression tree.gif


The recursive structure of a tree suggests recursive algorithms for operations. Generally, the base case will be if the current node is a leaf. The recursive step will involve recursing on each subtree.

General tree

In a general tree, each node has a label and list of children. An empty list of children means that node is a leaf.



class GeneralTree:
    def __init__(self, label, children=[]):
        self.label = label
        self.children = [c for c in children]


def leaf_count(t):
    """Returns the number of leaves in tree T.
    >>> t = GeneralTree(6, [GeneralTree(9), GeneralTree(8, [GeneralTree(4)]), GeneralTree(7)])
    #        6
    #  +-----+-----+
    #  |     |     |
    #  9     8     7
    #        |
    #        4
    >>> leaf_count(t)
    if not t.children: # if t is a leaf
        return 1
    return sum(leaf_count(child) for child in t.children)



(define (make-tree entry children)
    (cons entry children))
(define (entry tree)
    (car tree))
(define (children tree)
    (cdr tree))

When working with general trees in Scheme, it is often necessary to write two mutually recursive functions: one that handles a single tree and one that handles a forest. In the Python example above, we were able to loop through each tree in the forest to recurse on it. In Scheme, we have to write a function specifically to operate on the forest. This is a common structure:

; handles a tree
(define (foo-tree tree)
    (foo-forest (children tree)))
; handles a forest
(define (foo-forest forest)
    (foo-tree (car forest))
    (foo-forest (cdr forest)))


(define (leaf-count-tree t)
    (if (null? (children t)) ; t is a leaf
        (leaf-count-forest (children t))))
(define (leaf-count-forest forest)
    (if (null? forest)
        (+ (leaf-count-tree (car forest)) (leaf-count-forest (cdr forest)))))

leaf-count-tree relies on leaf-count-forest to count the leaves from the children of the current node and leaf-count-forest relies on leaf-count-tree to count the leaves in each tree in the forest.

Note that this method still works in Python:

def leaf_count_tree(t):
    if not t.children: # t is a leaf
        return 1
    return leaf_count_forest(t.children)
def leaf_count_forest(forest):
    if not forest:
        return 0
    return leaf_count_tree(forest[0]) + leaf_count_forest(forest[1:])

Binary tree

A binary tree is a tree where each node has at most two children. Each child is called a left or right child. For example, the tree above is a binary tree. Node A is the left child of node B, and node D is the right child.


Implementation (includes a function that pretty-prints a binary tree):

class BinTree:
    >>> t = BinTree(6, BinTree(2, BinTree(0), BinTree(4)), BinTree(10, BinTree(8), BinTree(12)))
    >>> t
    BinTree(6, BinTree(2, BinTree(0), BinTree(4)), BinTree(10, BinTree(8), BinTree(12)))
    >>> print(t)
     /   \ 
     2   10
    / \ /  \
    0 4 8 12
    def __init__(self, entry, left=None, right=None):
        self.entry = entry
        self.left = left
        self.right = right
    def __repr__(self): # used by the interpreter
        args = repr(self.entry)
        if self.left or self.right:
            args += ', {0}, {1}'.format(repr(self.left), repr(self.right))
        return 'BinTree({0})'.format(args)
    def __str__(self): # used by print() and str()
        return tree_string(self)
# Tree printing functions, kindly provided by Joseph Hui. #
def tree_string(tree):
    return "\n".join(tree_block(tree)[0])
def tree_block(tree):
    """Returns a list of strings, each string being
    one line in a rectangle of text representing the
    Also returns the index in the string which is
    centered above the tree's root.
    num_width: The width of the widest number in the tree (100 => 3)
    #If no children, just return string
    if tree.left is None and tree.right is None:
        return [str(tree.entry)], len(str(tree.entry))//2
    num_width = len(str(tree.entry)) #Width of this tree's entry
    #If only right child:
    if tree.left is None:
        r_block, r_cent = tree_block(tree.right)
        #Add left padding if necessary
        if r_cent < num_width*3/2:
            padding = ' '*((num_width*3)//2-r_cent)
            r_cent += ((num_width*3)//2-r_cent) #Record new center
            for line_index in range(len(r_block)):
                r_block[line_index] = padding+r_block[line_index]
        #Generate top two lines
        t_line = str(tree.entry)
        t_line += '-'*(r_cent-len(t_line))
        t_line += ' '*(len(r_block[0])-len(t_line))
        m_line = ' '*r_cent + '\\'
        m_line += ' '*(len(r_block[0])-len(m_line))
        return [t_line, m_line]+r_block, num_width//2
    #If only left child:
    if tree.right is None:
        l_block, l_cent = tree_block(tree.left)
        #Add right padding if necessary
        if len(l_block[0]) < l_cent+1+num_width:
            padding = ' '*(l_cent+1+num_width-len(l_block[0]))
            for line_index in range(len(l_block)):
                l_block[line_index] = l_block[line_index]+padding
        #Generate lines
        t_line = ' '*(l_cent+1)
        t_line += '-'*(len(l_block[0])-l_cent-1-num_width)
        t_line += str(tree.entry)
        m_line = ' '*l_cent+'/'
        m_line += ' '*(len(l_block[0]) - len(m_line))
        return [t_line, m_line]+l_block, len(t_line)-num_width//2
    #Otherwise, has both
    l_block, l_cent = tree_block(tree.left)
    r_block, r_cent = tree_block(tree.right)
    #Pad left block
    spacing = r_cent+len(l_block)-l_cent-2
    padding = ' '*max(1, (spacing-num_width))
    for line_index in range(len(l_block)):
        l_block[line_index] = l_block[line_index]+padding
    #Add left and right blocks
    new_block = []
    for line_index in range(max(len(l_block), len(r_block))):
        new_line = l_block[line_index] if line_index < len(l_block) else ' '*len(l_block[0])
        new_line += r_block[line_index] if line_index < len(r_block) else ' '*len(r_block[0])
    r_cent += len(l_block[0])
    #Generate top lines
    my_cent = (l_cent+r_cent)//2
    t_line = ' '*(l_cent+1)
    t_line += '-'*(my_cent-num_width//2-len(t_line))
    t_line += str(tree.entry)
    t_line += '-'*(r_cent-len(t_line))
    t_line += ' '*(len(new_block[0])-len(t_line))
    m_line = ' '*l_cent + '/'
    m_line += ' '*(r_cent-len(m_line)) + '\\'
    m_line += ' '*(len(new_block[0])-len(m_line))
    return [t_line, m_line]+new_block, my_cent

An empty tree is represented by None.

Here is a function that searches a binary tree for an element:

def find(t, elem):
    """Returns true if ELEM is an entry in binary tree T."""
    if not t: # t is an empty tree
        return False
    if elem == t.entry:
        return True
    return find(t.left, elem) or find(t.right, elem) # look in both subtrees



(define (tree entry left right)
    (list entry left right))
(define (entry tree)
    (car tree))
(define (left tree)
    (car (cdr tree)))
(define (right tree)
    (car (cdr (cdr tree))))

An empty tree is represented by nil or '().

Here is a function that searches a binary tree for an element:

(define (find t elem)
    (cond ((null? t) #f)
          ((= elem (entry t)) #t)
          (else (or (find (left t) elem) (find (right t) elem)))))

Binary search tree (BST)

The binary search tree (BST) is a special type of binary tree that satisfies the BST property: for any node n, every key in n's left subtree is less than n's key, and every key in n's right subtree is greater than n’s. No duplicate keys are allowed. Here is an example of a BST:

BST example.png
Balanced vs. unbalanced BST

The advantage of the BST is that the major operations (search, insert, and remove) run in $\Theta(\log n)$ time if the BST is balanced, where $n$ is the number of nodes in the BST.[1] A BST is balanced if:

  1. the number of nodes in its left branch differs from the number of nodes in its right branch by at most 1
  2. its non-empty branches are also balanced trees

However, if the BST is imbalanced (e.g., every node has only 1 child), the runtime of search, insert, and remove is $\Theta(n)$ at worst.

Because of their efficiency as a data repository, BSTs are used to implement data structures such as sets or dictionaries. For sets, the elements would be keys. For dictionaries, the key-value pairs would be keys.

Here is a Python function for searching a BST for an element:

def find(t, elem):
    """Returns true if ELEM is an entry in BST T."""
    if not t: # t is an empty tree
        return False
    if elem == t.entry:
        return True
    if elem < t.entry:
        return find(t.left, elem) # look in left subtree
    return find(t.right, elem) # look in right subtree

Notice that we take advantage of the BST property by recursing only on one subtree instead of both.


  1. A balanced BST has $\Theta(\log n)$ height. Searching, insertion, and removal are operations that traverse the BST's height, so these operations have a runtime of $\Theta(\log n)$.