Difference between revisions of "Tree"

 [checked revision] [checked revision]

A tree is an abstract model of a hierarchical structure. It consists of nodes with a parent-child relationship. Its name comes from the fact that when drawn, it resembles an upside-down real tree — the root at the top and the leaves at the bottom. A tree is a recursive data structure; it is either the empty tree or a node that contains a label and references to subtrees. The label can be any data type.

In computer science, there are many different types of trees. In 61A, we will look at general trees and binary trees.

Terminology

Here is a tree:

• node: a point in the tree. Each node has a label (value at each node). In the figure, each circle represents a node.
• root: the node at the top. Every tree has one root. In the figure, node F is the root.
• parent: the node that is directly above a node. In the figure, node D is the parent of nodes C and E.
• children: the nodes directly beneath a node. In the figure, nodes C and E are the children of node D.
• inner node: a node with children. In the figure, nodes B, D, F, G, and I are inner nodes.
• leaf: a node with no children. In the figure, nodes A, C, E, and H are leaves.
• subtree: a node and all its descendants. In the figure, the tree rooted at node B is a subtree of node F.
• depth: how far away a node is from the root. The root of a tree has depth 0. In the figure, node D has depth 2.
• height: the depth of the lowest leaf. In other words, the length of the longest path from the root to a leaf. An empty tree has height 0. In the figure, the height is 3.

Applications

Here are some applications of trees:

• hierarchical organization of real-world data
• family trees, taxonomies
• filesystem tree – used by operating systems to maintain the filesystem. A directory is the parent of the enclosed files/directories. Here is a UNIX filesystem tree:
• data structures for efficiently storing and retrieving data
• binary search tree – store objects in a sorted structure that allows for fast searching
• decision trees – diagram a branching algorithmic process. Each node is a question, each branch is an answer to a question, and each leaf is an outcome. Here is an example:
• parse tree – shows the structure of source code. Here is the parse tree for (2 + 6) / (9 % 5):

Operations

The recursive structure of a tree suggests recursive algorithms for operations. Generally, the base case will be if the current node is a leaf. The recursive step will involve recursing on each subtree.

General tree

In a general tree, each node has a label and list of children. An empty list of children means that node is a leaf.

Python

def tree(datum, children=[]):
# YOU DON'T KNOW THE IMPLEMENTATION

def datum(tree):
# YOU DON'T KNOW THE IMPLEMENTATION

def children(tree):
# YOU DON'T KNOW THE IMPLEMENTATION

Example:

def leaf_count(t):
"""Returns the number of leaves in tree T.
>>> t = tree(6, [tree(9), tree(8, [tree(4)]), tree(7)])
#        6
#  +-----+-----+
#  |     |     |
#  9     8     7
#        |
#        4
>>> leaf_count(t)
3
"""
if not children(t) == []: # if t is a leaf
return 1
return sum(leaf_count(child) for child in children(t))

Scheme

Implementation:

(define (make-tree entry children)
(cons entry children))
(define (entry tree)
(car tree))
(define (children tree)
(cdr tree))

When working with general trees in Scheme, it is often necessary to write two mutually recursive functions: one that handles a single tree and one that handles a forest. In the Python example above, we were able to loop through each tree in the forest to recurse on it. In Scheme, we have to write a function specifically to operate on the forest. This is a common structure:

; handles a tree
(define (foo-tree tree)
...
(foo-forest (children tree)))

; handles a forest
(define (foo-forest forest)
...
(foo-tree (car forest))
...
(foo-forest (cdr forest)))

Example:

(define (leaf-count-tree t)
(if (null? (children t)) ; t is a leaf
1
(leaf-count-forest (children t))))

(define (leaf-count-forest forest)
(if (null? forest)
0
(+ (leaf-count-tree (car forest)) (leaf-count-forest (cdr forest)))))

leaf-count-tree relies on leaf-count-forest to count the leaves from the children of the current node and leaf-count-forest relies on leaf-count-tree to count the leaves in each tree in the forest.

Note that this method still works in Python:

def leaf_count_tree(t):
if not t.children: # t is a leaf
return 1
return leaf_count_forest(t.children)

def leaf_count_forest(forest):
if not forest:
return 0
return leaf_count_tree(forest[0]) + leaf_count_forest(forest[1:])

Binary tree

A binary tree is a tree where each node has at most two children. Each child is called a left or right child. For example, the tree above is a binary tree. Node A is the left child of node B, and node D is the right child.

Python

Implementation (includes a function that pretty-prints a binary tree):

An empty tree is represented by None.

Here is a function that searches a binary tree for an element:

def find(t, elem):
"""Returns true if ELEM is an entry in binary tree T."""
if not t: # t is an empty tree
return False
if elem == t.entry:
return True
return find(t.left, elem) or find(t.right, elem) # look in both subtrees

Scheme

Implementation:

(define (tree entry left right)
(list entry left right))

(define (entry tree)
(car tree))

(define (left tree)
(car (cdr tree)))

(define (right tree)
(car (cdr (cdr tree))))

An empty tree is represented by nil or '().

Here is a function that searches a binary tree for an element:

(define (find t elem)
(cond ((null? t) #f)
((= elem (entry t)) #t)
(else (or (find (left t) elem) (find (right t) elem)))))

Binary search tree (BST)

The binary search tree (BST) is a special type of binary tree that satisfies the BST property: for any node n, every key in n's left subtree is less than n's key, and every key in n's right subtree is greater than n’s. No duplicate keys are allowed. Here is an example of a BST:

Balanced vs. unbalanced BST

The advantage of the BST is that the major operations (search, insert, and remove) run in $\Theta(\log n)$ time if the BST is balanced, where $n$ is the number of nodes in the BST.[1] A BST is balanced if:

1. the number of nodes in its left branch differs from the number of nodes in its right branch by at most 1
2. its non-empty branches are also balanced trees

However, if the BST is imbalanced (e.g., every node has only 1 child), the runtime of search, insert, and remove is $\Theta(n)$ at worst.

Because of their efficiency as a data repository, BSTs are used to implement data structures such as sets or dictionaries. For sets, the elements would be keys. For dictionaries, the key-value pairs would be keys.

Here is a Python function for searching a BST for an element:

def find(t, elem):
"""Returns true if ELEM is an entry in BST T."""
if not t: # t is an empty tree
return False
if elem == t.entry:
return True
if elem < t.entry:
return find(t.left, elem) # look in left subtree
return find(t.right, elem) # look in right subtree

Notice that we take advantage of the BST property by recursing only on one subtree instead of both.

Footnotes

1. A balanced BST has $\Theta(\log n)$ height. Searching, insertion, and removal are operations that traverse the BST's height, so these operations have a runtime of $\Theta(\log n)$.