Author 
Topic: infectious disease (Read 1750 times) 

tomas
Newbie
Posts: 8


infectious disease
« on: Dec 5^{th}, 2008, 8:26am » 
Quote Modify

In the town with the population of n people, there is an infectious disease.So far there has been 1% of the residents being infected. Provided that there is a testing centre which can test up to 10^4 samples per day. the centre can test either a combination of blood samples or separate ones however , if there is more than 1 infectious sample in a msized sample (n>m>1), the result will be wrong.else, if there is 0 or 1 infectious sample in a msized sample, the result will be correct. How many days does it take to find out all the infected people

« Last Edit: Dec 5^{th}, 2008, 8:32pm by tomas » 
IP Logged 



teekyman
Full Member
Gender:
Posts: 199


Re: infectious disease
« Reply #1 on: Dec 5^{th}, 2008, 8:42am » 
Quote Modify

Did you mean at least one infected sample makes the test detect the infection, or did you really mean more than 1? Also, can the center process 10^4 people, or samples? (I'm also assuming that n >> 10^4 because we're probably not looking for a naive solution.)


IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #2 on: Dec 5^{th}, 2008, 8:52am » 
Quote Modify

if there is only 1 infectious sample in the msized sample , then the result will be correct. else , there are > 1 infectious samples , it will be wrong the centre can process a msized sample (combination of m people's blood sample, here 1<=m<=10^4) people's blood sample= people (the same ) of course , n >10^4 let's say ,n > infinity but the strategy is unchanged for a random value of n

« Last Edit: Dec 5^{th}, 2008, 8:53am by tomas » 
IP Logged 



teekyman
Full Member
Gender:
Posts: 199


Re: infectious disease
« Reply #3 on: Dec 5^{th}, 2008, 10:08am » 
Quote Modify

Well if having two or more infected people donate samples causes the test to give a false negative infection, then that will complicate things indeed. In this situation since we know that a diseasepositive test tells us that there is exactly one person in the group with the disease, but a disease negative test tells that either 0 or more than 2 people have the disease, we really only get knowledge out of disease positive tests. Say we test with groups of size k. (assume for now that k = 100 so that some calculations become much easier) Then if we get a positive result, which happens with probability .01k*e^.01k = e ^ 1 we can use binary search to isolate the single person with the disease in that group of size k which will take a total of log(100) (base 2) + 1 (about steps including the initial test. So doing this once through the entire population (dividing them into batches of size k), will take 8 * n / 100 steps, or 2/25 * n. We will successfully determine whether e^1 * n are infected or not on average, and the rest we won't know anything about. So after doing this, we are left with (on average) n * (1  e^1) people who tested negative for the disease, but still may be infected. Among the people that we were able to verify, 1/100 people are diseased, so the same ratio of infected people are left among the general population. So we do the exact same thing, choosing 100 random people per sample again. This time, we can reduce to n*(1e^1)^2 people left after an additional 8 * (1e^1) * n / 100 steps. We quickly see that the total number of samples (if we choose k people at each step) needed is an infinite geometric series, which converges to n * 2e / 25, or about .217n. Now I don't think that choosing k = 100 is the best sample size per group. However, choosing different sample sizes will alter the proportion of the remaining population with the disease, and I am too lazy to do all that calculation right now, especially since the sample size k will probably need to change as the percentage of the population that is diseased changes. I'm not sure im understanding the operating practices of the hospital though; I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)?

« Last Edit: Dec 5^{th}, 2008, 10:09am by teekyman » 
IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #4 on: Dec 5^{th}, 2008, 5:01pm » 
Quote Modify

sorry for my ambiguous expression. but the result is positive for either 1 infected sample or 0 infected sample in a ksized sample ( in the first reply i assumed that u will understand that 0 infected sample always make the result correct). Quote:I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)? 
 > yes, u are right one hint :even n > infinity or a random large value of n gives us the same strategy and result , i mean the solution is independent of n . anyways, nice try

« Last Edit: Dec 5^{th}, 2008, 5:06pm by tomas » 
IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #5 on: Dec 6^{th}, 2008, 7:14am » 
Quote Modify

noone help me ?


IP Logged 



rmsgrey
Uberpuzzler
Gender:
Posts: 2844


Re: infectious disease
« Reply #6 on: Dec 6^{th}, 2008, 10:13am » 
Quote Modify

For clarification, what is the result of the test if: a) there are 0 infected people b) there is exactly 1 infected person c) there are 2 or more infected people in the sample?


IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #7 on: Dec 6^{th}, 2008, 5:58pm » 
Quote Modify

a)the sample is positive b)the sample is negative here, if it is a sample of size 1 , u will know the infected person if it is a sample of size m (m>=1), it will not inform the name of the infected person, only say "negative" c) the sample is negative


IP Logged 



rmsgrey
Uberpuzzler
Gender:
Posts: 2844


Re: infectious disease
« Reply #8 on: Dec 7^{th}, 2008, 9:09am » 
Quote Modify

In other words, the test tells you whether there is at least one infected person in the group tested rather than telling you whether there is exactly one (which is how people were reading the earlier statements)


IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #9 on: Dec 7^{th}, 2008, 7:29pm » 
Quote Modify

sorry for my ambiguous expression in early posts. in short , what u can get from the result is : a/ positive : no infected person in the sample b/ negative : at least 1 infected person in the sample


IP Logged 



hawkxor
Newbie
Gender:
Posts: 5


Re: infectious disease
« Reply #10 on: Dec 7^{th}, 2008, 9:48pm » 
Quote Modify

tomas, you are still ambiguous about the abilities of the testing center. is it correct that the center can process 10^4 samples each day, even if each of those samples consists of the blood of 1000 unique people?


IP Logged 



teekyman
Full Member
Gender:
Posts: 199


Re: infectious disease
« Reply #11 on: Dec 8^{th}, 2008, 5:05am » 
Quote Modify

Just dealing with the total number of samples needed, (assuming nothing too funny happens with the processing of samples in the hospital once that gets cleared up), then we only gain information about the infected status of an individual person when either: that person's sampled in a batch of people that are tested positive OR that person is tested individually negative. To avoid testing everyone individually, our only hope to be faster is to perform batch tests of people and hope to clear large groups of people as free of disease at once before we resort to individual testing among a more concentrated pool. If we test a batch of k people when the concentration of infected people is p, then the probability that they are cleared is (1p)^k. the expected number cleared is k * (1p) ^ k if k > 1. With a little calculus and algebra, we see this is maximized when k = 1/ln(1p) ~ 1/p for small p. k as a function of p is a decreasing, concave up function until p = 1  e^2 (about .864), which tells us that greedily trying to maximize k until this point is probably is the right thing to do. After p = .3 or so, that expected value drops to below 1, and we want to start testing people individually anyways. So the idea is to repeatedly: 1) Calculate the proportion p of remaining people in the population that are infected. 2) if this is below p = 1  e^(e^1) (about .3) then test a population of size about k = 1/ln(1p). If its above that value, then test single individuals. 3) If the group has at least one infected person, throw them back in the pool. If they are clear, set them aside. 3) repeat until everyone is tested. I'm sure optimizations can be made with respect to rounding, and perhaps even with what to do with the people who test positive. As for the total number of samples needed for this method: a lot. This will be a little annoying to calculate at best, and I will do it later, provided I am able to.

« Last Edit: Dec 8^{th}, 2008, 5:07am by teekyman » 
IP Logged 



rmsgrey
Uberpuzzler
Gender:
Posts: 2844


Re: infectious disease
« Reply #12 on: Dec 8^{th}, 2008, 12:16pm » 
Quote Modify

A naive solution (which I'm not even going to bother working out expected runtimes for): For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log_{2}n tests


IP Logged 



River Phoenix
Junior Member
Gender:
Posts: 125


Re: infectious disease
« Reply #13 on: Dec 8^{th}, 2008, 3:08pm » 
Quote Modify

on Dec 8^{th}, 2008, 12:16pm, rmsgrey wrote:A naive solution (which I'm not even going to bother working out expected runtimes for): For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log_{2}n tests 
 Suppose that we know that at least 1 from a sample of n people is infected: Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances. Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1  0.99^n) * 2 f(n) Clearly f(2) = 2. It follows that: f(2^n) = 2^n [10.99^2][10.99^4]...[10.99^(2^(n1))] = 2^n (10.99)^(n1) [1+0.99][1+0.99^2]...[1+0.99^(2^(n2))] I believe that (1+a)(1+a^2)... converges, hence: f(2^n) < C (0.02)^n f(n) < C (0.02)^log_{2}n So the probabilistic order of growth seems like it is actually very good.

« Last Edit: Dec 8^{th}, 2008, 4:49pm by River Phoenix » 
IP Logged 



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #14 on: Dec 8^{th}, 2008, 10:19pm » 
Quote Modify

rmsgrey and Phoenix are on the right track .However, so far no answer is correct. You all make this one seem more complicated than it must be... Let solve this extracted version then you can find the solution for the original one: there are 100 people in the town and k infected people among them (1<=k<=100) prove that : the number of tests required <=7k (sorry if i made a mistake again, cause this is the first time i created a puzzle >_<)


IP Logged 



River Phoenix
Junior Member
Gender:
Posts: 125


Re: infectious disease
« Reply #15 on: Dec 8^{th}, 2008, 11:52pm » 
Quote Modify

My calculation must be wrong by the way, the end equation is decreasingb


IP Logged 



towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730


Re: infectious disease
« Reply #16 on: Dec 9^{th}, 2008, 1:34am » 
Quote Modify

on Dec 8^{th}, 2008, 10:19pm, tomas wrote:Let solve this extracted version then you can find the solution for the original one: there are 100 people in the town and k infected people among them (1<=k<=100) prove that : the number of tests required <=7k 
 Do you know k to start with? log_{2} C(100,k) 7k

« Last Edit: Dec 9^{th}, 2008, 1:34am by towr » 
IP Logged 
Wikipedia, Google, Mathworld, Integer sequence DB



tomas
Newbie
Posts: 8


Re: infectious disease
« Reply #17 on: Dec 9^{th}, 2008, 2:04am » 
Quote Modify

Towr, of course ur inequation is correct but let's try that with k=1 [log_{2}C(1,100)]=6 pls tell me a strategy that work with 6 times to find out that 1 infected person ? hidden:  ( i think the optimal solution is 7 here )  denote the times required =f 7k is just a general, loose sup value ex : k=1, f=7 k=2, f=13(not 14)


IP Logged 



towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730


Re: infectious disease
« Reply #18 on: Dec 9^{th}, 2008, 2:12am » 
Quote Modify

on Dec 9^{th}, 2008, 2:04am, tomas wrote:Towr, of course ur inequation is correct but let's try that with k=1 [log_{2}C(1,100)]=6 
 I get 7. x means you round x up. ~6.6 rounds up to 7. Quote:hidden:  ( i think the optimal solution is 7 here )  
 indeed it is. However, the key question is, do we know k beforehand, or do we need to find it, also?

« Last Edit: Dec 9^{th}, 2008, 2:13am by towr » 
IP Logged 
Wikipedia, Google, Mathworld, Integer sequence DB



River Phoenix
Junior Member
Gender:
Posts: 125


Re: infectious disease
« Reply #19 on: Dec 9^{th}, 2008, 3:59am » 
Quote Modify

on Dec 8^{th}, 2008, 3:08pm, River Phoenix wrote: Suppose that we know that at least 1 from a sample of n people is infected: Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances. Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1  0.99^n) * 2 f(n) Clearly f(2) = 2. It follows that: 
 Actually (10.99^n) goes to 1, so if you just take it to be the case that f(2n) < 2 f(n), then f(n) < n. Which is obvious.. so I think this whole strategy is a deadend. I think the reason is that in the steady state as n> infinity and you start subdividing, both halves always have at least one infected person, so it's no better order of growth wise than checking all people separately. As towr notes, you can do it in log_{2}C(n,k) given that you know k (via binary search over all possible sets of k). There is an obvious duality between this and the strategy I analyzed which ends up taking n steps: the sum of log_{2}C(n,k) over all k equals n. Since they both use binary search, they are essentially the same strategy. The cleverness is to separate out the instances by k, but in sum we still haven't really beaten the naive strategy of checking each person separately. This is not that insightful but I think interesting.

« Last Edit: Dec 9^{th}, 2008, 4:01am by River Phoenix » 
IP Logged 



