wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi) riddles >> hard >> Generating Random Trees (Message started by: william wu on Sep 4th, 2003, 1:27am)

Title: Generating Random Trees
Post by william wu on Sep 4th, 2003, 1:27am
Described below are two ways for generating a random binary tree with n nodes. Although they seem very different, they are surprisingly equivalent -- that is, both methods produce the same probability distribution on binary trees with n nodes. The problem here is to prove that they are equivalent.

Method 1

- First expand the root node: /\
- Now expand one of the two terminal nodes at random, so you end up with either

/\
/\

or

/\
/\

- Continue in this fashion at each step, choosing one of the terminal nodes of the tree equally at random, and expanding it. Keep doing this until you have a tree with n terminal nodes.

Method 2

- Choose an integer N1 uniformly distributed between 1 and (n-1) inclusive. Now starting with the root node, expand it and assign N1 to the left child, and n - N1 to the right child:

/\
N1  n-N1

- Now choose an integer N2 uniformly distributed between 1 and (N1-1) inclusive. Also, independently choose an integer N3 uniformly distributed between 1 and ((n-N1)-1) inclusive. Expanding in the same fashion, we now have the following tree:

http://www.ocf.berkeley.edu/~wwu/images/riddles/random_tree.jpg

- Continue expanding in this fashion until no more subdivisions are possible at any terminal node. The result is your random tree.

Title: Re: Generating Random Trees
Post by James Fingas on Sep 16th, 2003, 1:31pm
I think it is sufficient to show that method 1 divides the new nodes into the left and right branches of the root node with the same probability distribution as the second case. The rest can be proved recursively from that.

I'll keep thinking on it.

Title: Re: Generating Random Trees
Post by william wu on Sep 29th, 2003, 11:01pm
I don't know the solution myself. The hint for this problem is "Polya's Urn Model", which I am unfamiliar with.

Title: Re: Generating Random Trees
Post by visitor on Sep 30th, 2003, 11:17am
What are the chances that your final tree will have one long original branch that never branches further. In the second method it's simply 2/n, the odds that you'll pick either 1 or n-1. In the first method, the first branch creates two trunks, the second eliminates one. The odds are 2/3 that this trunk will remain unbranched in the next division; times 3/4 for the division after that; times 4/5 for the division after that. The middle numbers will all cancel out, leaving 2/n.
From any end node that is reached in method two, you can back up to the branching that created it and count the number of times the other branch was further subdivided, and the odds must, once again be identical, since, at the time when this node was created, you were simply dealing with a number that acts just like our top-of-the-tree number.

Title: Re: Generating Random Trees
Post by visitor on Sep 30th, 2003, 11:19am
I meant 2/(n-1), of course.

Title: Re: Generating Random Trees
Post by James Fingas on Oct 17th, 2003, 1:04pm
Continuing from my first post, where I state that the two models are equivalent iff they produce the same probability distribution for the number of nodes in the left and right subtrees of the root node. This property is easily verified as follows:
[hide]
The second method obviously produces a uniform distribution on the number of nodes in the left and right subtrees. That is to say, if there are n nodes to be distributed, the left subtree of the root will have an equal chance of having 1 or 2 or 3 ... or n-1 of them.

What remains is to show that this is true of the first method. This is quite easy. The first step is always to put 1 node each into the left and right subtrees. So we start with the following tree:

o o

The next step will, with equal probability, split either the left or the right subtree up:

(o o) o  (p = 0.5)
o (o o)  (p = 0.5)

The next step:

((o o) o) o  (p = 1/3 * 0.5)
(o (o o)) o  (p = 1/3 * 0.5)
(o o) (o o)  (p = 1/3 * 0.5)

(o o) (o o)  (p = 1/3 * 0.5)
o ((o o) o)  (p = 1/3 * 0.5)
o (o (o o))  (p = 1/3 * 0.5)

You can see that there are only three distinct possibilities for the distribution of nodes between the left and right subtrees, and that they all have the same probability. Now I'll skip over the diagrams and do this with math:

Define p(n,k) to be the probability that there are k nodes in the left hand subtree of the root node when there are n leaf nodes in the entire tree.

p(2,1) = 1

Evidently, when we start with a scenario p(n,k) and add another leaf node, we have a k/n probability of getting k+1 in the left, and an (n-k)/n probability of getting k. Looking at it the other way, p(n+1,k) = p(n,k-1)*(k-1)/n + p(n,k)*(n-k)/n. If we assume that p(n,k) = 1/(n-1) for 1 <= k < n, then we get:

p(n+1,k) = 1/(n-1)*(k-1)/n + 1/(n-1)*(n-k)/n
= (k-1+n-k)/n/(n-1)
= (n-1)/n/(n-1)
= 1/n

This shows inductively that p(n,k) = 1/(n-1), with the observation that although p(n,0) was called for in the inductive step (and is zero, not 1/(n-1)), it was multiplied by 0 so the answer is still valid. [/hide]

So the two methods distribute nodes to the left and right subtrees of the root node in the same manner.