wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> hard >> 100 tails
(Message started by: NickH on Apr 30th, 2004, 8:30pm)

Title: 100 tails
Post by NickH on Apr 30th, 2004, 8:30pm
A fair coin is tossed until it shows tails 100 times.  

Given that the longest run of consecutive heads (which may have occurred one or more times) was five heads, what is the expected number of heads shown?  What if the longest run of consecutive heads had been 10?

Title: Re: 100 tails
Post by william wu on Aug 3rd, 2004, 2:30pm
We have a sequence of heads and tails, with exactly 100 tails, where the last flip is a tail. In between each tail, we can have at most k heads.

Let X = total number of heads shown. Then we can express X as the sum of 100 random variables X_1 through X_100, where X_i corresponds to the number of consecutive heads in between the (i-1)th and ith tails. Note that X_1 is then the number of heads at the start of the sequence, which could be 0 if the sequence starts with a tail.

We then wish to compute

     E(X | max(X_i) = k) = E( X_1 + ... + X_100 | max(X_i) = k)

for some constant k (5 or 10 in NickH's problem statement).

With the conditioning, at least one of the X_i's must be exactly k. Without loss of generality, let this be X_100 (I think that by commutativity of addition it doesn't matter for which X_i this maximum occurs). Pulling X_100 out of the expecation, we then have
     
     E(X | max(X_i) = k) = k + E( X_1 + ... + X_99 | X1 through X99 are each <= k)

The remaining variables X_1 through X_99 are now upper-bounded by k. We then apply linearity of expectation and compute conditional expectations:

     E(X | max(X_i) = k)
     = k + [sum]i=1 to 99E[Xi | Xi <= k]       
     = k + 99 * E[X1 | X1 <= k]       

To compute this expectation we will have to normalize by the prior probability Pr(X1 <= k). Recall that X_1 is the number of flips between two tails. Thus X_1 follows a geometric distribution, but shifted "left" by one flip (and thus has expectation "1" for a fair coin, rather than "2"). For instance, Pr(X_1 = 0) = 1/2, Pr(X_1 = 1) = (1 - 1/2)*(1/2), etc. In general,
     
     Pr(X_1 = m) = (1/2)m+1  for m = 0, 1, 2, ...

To compute Pr(X_1 <= k), we thus have
     
     Pr(X_1 <= k) = [sum]m = 0 to k Pr(X_1 = m) = [sum]m = 0 to k (1/2)^(m+1) = 1 - (1/2)k+1.

Thus,
           
     E[X1 | X1 <= k] = (1 - (1/2)k+1)-1 * [sum]m = 0 to k m (1/2)m+1

which after some annoying manipulation of geometric series identities (take the derivative) and algebraic simplification, is reducible to
           
     E[X1 | X1 <= k] = (2k+1 - k - 2)/(2k+1 - 1).

Checking some values of k shows that this expression agrees with intuition. When k is very large, we are conditioning on a very rare event (k tails in a row), and the expectation should agree with the unconditioned geometric expectation; indeed, as k is large, the conditional expectation approaches 1. And as k is smaller, the conditional expectation is less than 1. For k = 2, the conditional expectation is 0.5714.

Thus the final result is
           
     E(X | max(X_i) = k)  = k + 99*(2k+1 - k - 2)/(2k+1 - 1)


A plot of this as a function of k is shown below. (It should be a discrete plot but the points have been interpolated for illustrative purposes.) For k = 5 and 10, we have E(X | max(X_i) = 5)  = 94.5714 and E(X | max(X_i) = 10)  = 108.468. Note that if we did not know anything about the maximal run length, then the expected head count would simply be 100. At around k = 8, the conditioned event X_i <= k has such high probability that this conditioning has little effect on our expected head count. Afterwards we see a virtually linear growth rate due to the linear growth of the "X_100 = k" term out in front, and the fact that the other term ((2k+1 - k - 2)/(2k+1 - 1)) is roughly equal to 1. This can be better seen in the second plot (see second posting), which replaces the X_100 term with a 1.

Title: Re: 100 tails
Post by william wu on Aug 3rd, 2004, 2:32pm
The second graph, attached below

Private mental thought: Perhaps someone will catch an error in all this and these plots will be for naught :)

Title: Re: 100 tails
Post by Rezyk on Aug 6th, 2004, 11:24pm

Quote:
With the conditioning, at least one of the X_i's must be exactly k. Without loss of generality, let this be X_100 (I think that by commutativity of addition it doesn't matter for which X_i this maximum occurs). Pulling X_100 out of the expecation, we then have
     
     E(X | max(X_i) = k) = k + E( X_1 + ... + X_99 | X1 through X99 are each <= k)


I believe that WLOG doesn't hold there.  Consider the simpler scenario of flipping until 2 tails are accumulated, with the longest run of heads being 1.  The equivalent argument would give

E(X_1+X_2 | max(X_1,X_2)=1) = 1+E(X_1 | X1<=k)
6/5 = 4/3

Title: Re: 100 tails
Post by william wu on Aug 7th, 2004, 2:38am
Thanks for the heads-up Rezyk. I got 6/5 as well from considering the possible sequences HTHT, HTT, and THT with respective posteriori probabilities of 0.2,0.4, and 0.4 respectively. However, I'm having trouble figuring out the problem with this line, if anything:

E[ X1 + ... + Xn | max(X1,...,Xn) = k ] = k + E[ X1 + ... + Xn-1 | each X_i <= k ]

Even if we consider the many possible slots in which the maximum occurs, all are equally likely, so the end result is the right-hand side. Hmm.

Also I did a Monte Carlo simulation of the process which stops after 2 tails, and the maximum head runlength is 1. But I'm not getting 6/5, nor 4/3.  Rather, I get around 1.13, and inexplicably, there are over twice more THT strings than HTT strings. Maybe my program still has a bug in it; but if someone does his/her own experiment, do tell what you get.

Title: Re: 100 tails
Post by asterix on Aug 7th, 2004, 3:20pm
The math to tell you where the problem might be is beyond me, but I did a Monte Carlo and got the logical 1.2 for that simple example. When tails was set at 100 I got: for
k : Result
3 : 74
4 : 84
5 : 92
6 : 97.5
7 : 102
8 : 105
9 : 107
10 : 108.5
11 : 110
12 : 111

Title: Re: 100 tails
Post by Rezyk on Aug 8th, 2004, 9:46pm

Quote:
Even if we consider the many possible slots in which the maximum occurs, all are equally likely, so the end result is the right-hand side. Hmm.


If each "this slot is maximum" event were mutually exclusive, then the WLOG argument would follow...

Title: Re: 100 tails
Post by william wu on Aug 9th, 2004, 10:17pm
Rezyk, good point. The statement

E(X | max(X_i) = k) = k + E( X_1 + ... + X_99 | X1 through X99 are each <= k)  

is convincing at first, but subtly false, since the events

X_1 <= k, X_2 <= k, ...., X_(n-1) <=k, X_n = k
X_1 <= k, X_2 <= k, ...., X_(n-1) =k, X_n <= k
...
X_1 = k, X_2 <= k, ...., X_(n-1) <=k, X_n <= k

are not mutually exclusive. Namely, there can be ties. Although it is interesting to note that if we were working with continuous distributions, the statement above would be true. Anyhow, so now we may have to do some inclusion-exclusion stuff, which I'm too lazy to think about at the moment.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board