wu :: forums « wu :: forums - infectious disease » Welcome, Guest. Please Login or Register. Jan 22nd, 2022, 12:48pm RIDDLES SITE WRITE MATH! Home Help Search Members Login Register
 wu :: forums    riddles    hard (Moderators: Eigenray, SMQ, Grimbal, Icarus, towr, ThudnBlunder, william wu)    infectious disease « Previous topic | Next topic »
 Pages: 1 Reply Notify of replies Send Topic Print
 Author Topic: infectious disease  (Read 1750 times)
tomas
Newbie

Posts: 8
 infectious disease   « on: Dec 5th, 2008, 8:26am » Quote Modify

In the town with the population of n people, there is an infectious disease.So far there has been 1% of the residents being infected.
Provided that there is a testing centre which can test up to 10^4 samples per day.

the centre can test either a combination of  blood samples or separate ones

however , if there is more than 1 infectious sample in a m-sized sample (n>m>1), the result will be wrong.else, if there is 0 or 1 infectious sample in a m-sized sample, the result will be correct.

How many days does it take to find out all the infected people
 « Last Edit: Dec 5th, 2008, 8:32pm by tomas » IP Logged
teekyman
Full Member

Gender:
Posts: 199
 Re: infectious disease   « Reply #1 on: Dec 5th, 2008, 8:42am » Quote Modify

Did you mean at least one infected sample makes the test detect the infection, or did you really mean more than 1?

Also, can the center process 10^4 people, or samples?

(I'm also assuming that n >> 10^4 because we're probably not looking for a naive solution.)
 IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #2 on: Dec 5th, 2008, 8:52am » Quote Modify

if there is only 1 infectious sample in the m-sized sample , then the result will be correct.
else , there are > 1 infectious samples , it will be wrong
the centre can process a m-sized sample (combination of m people's blood sample, here 1<=m<=10^4)
people's blood sample= people (the same )
of course , n >10^4
let's say ,n --> infinity
but the strategy is unchanged for a random value  of n
 « Last Edit: Dec 5th, 2008, 8:53am by tomas » IP Logged
teekyman
Full Member

Gender:
Posts: 199
 Re: infectious disease   « Reply #3 on: Dec 5th, 2008, 10:08am » Quote Modify

Well if having two or more infected people donate samples causes the test to give a false negative infection, then that will complicate things indeed. In this situation since we know that a disease-positive test tells us that there is exactly one person in the group with the disease, but a disease negative test tells that either 0 or more than 2 people have the disease, we really only get knowledge out of disease positive tests. Say we test with groups of size k. (assume for now that k = 100 so that some calculations become much easier)

Then if we get a positive result, which happens with probability .01k*e^-.01k = e ^ -1 we can use binary search to isolate the single person with the disease in that group of size k  which will take a total of log(100) (base 2) + 1 (about steps including the initial test. So doing this once through the entire population (dividing them into batches of size k), will take 8 * n / 100 steps, or 2/25 * n. We will successfully determine whether e^-1 * n are infected or not on average, and the rest we won't know anything about.

So after doing this, we are left with (on average) n * (1 - e^-1) people who tested negative for the disease, but still may be infected. Among the people that we were able to verify, 1/100 people are diseased, so the same ratio of infected people are left among the general population. So we do the exact same thing, choosing 100 random people per sample again. This time, we can reduce to n*(1-e^-1)^2 people left after an additional 8 * (1-e^-1) * n / 100 steps.

We quickly see that the total number of samples (if we choose k people at each step) needed is an infinite geometric series, which converges to n * 2e / 25, or about .217n. Now I don't think that choosing k = 100 is the best sample size per group. However, choosing different sample sizes will alter the proportion of the remaining population with the disease, and I am too lazy to do all that calculation right now, especially since the sample size k will probably need to change as the percentage of the population that is diseased changes.

I'm not sure im understanding the operating practices of the hospital though; I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)?
 « Last Edit: Dec 5th, 2008, 10:09am by teekyman » IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #4 on: Dec 5th, 2008, 5:01pm » Quote Modify

sorry for my ambiguous expression.
but the result is positive for either 1 infected sample or 0 infected sample in a k-sized sample ( in the first reply i assumed that u will understand that 0 infected sample always  make the result correct).
Quote:
 I am right in saying that the hospital takes the same effort to process one sample of any size (even if it's just one person)?
---> yes, u are right
one hint :even n --> infinity or  a random  large value of n gives us the same strategy and result , i mean the solution is independent of n .
anyways, nice try
 « Last Edit: Dec 5th, 2008, 5:06pm by tomas » IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #5 on: Dec 6th, 2008, 7:14am » Quote Modify

noone help me ?
 IP Logged
rmsgrey
Uberpuzzler

Gender:
Posts: 2844
 Re: infectious disease   « Reply #6 on: Dec 6th, 2008, 10:13am » Quote Modify

For clarification, what is the result of the test if:

a) there are 0 infected people
b) there is exactly 1 infected person
c) there are 2 or more infected people

in the sample?
 IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #7 on: Dec 6th, 2008, 5:58pm » Quote Modify

a)the sample is positive
b)the sample is negative
here, if it is a sample of size 1 , u will know the infected person
if it is a sample of size m (m>=1), it will not inform the name of the infected person, only say "negative"
c) the sample is negative
 IP Logged
rmsgrey
Uberpuzzler

Gender:
Posts: 2844
 Re: infectious disease   « Reply #8 on: Dec 7th, 2008, 9:09am » Quote Modify

In other words, the test tells you whether there is at least one infected person in the group tested rather than telling you whether there is exactly one (which is how people were reading the earlier statements)
 IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #9 on: Dec 7th, 2008, 7:29pm » Quote Modify

sorry for my ambiguous expression in early posts.
in short ,
what u can get from  the result is :
a/ positive : no infected person in the sample
b/ negative : at least 1 infected person in the sample
 IP Logged
hawkxor
Newbie

Gender:
Posts: 5
 Re: infectious disease   « Reply #10 on: Dec 7th, 2008, 9:48pm » Quote Modify

tomas, you are still ambiguous about the abilities of the testing center.

is it correct that the center can process 10^4 samples each day, even if each of those samples consists of the blood of 1000 unique people?
 IP Logged
teekyman
Full Member

Gender:
Posts: 199
 Re: infectious disease   « Reply #11 on: Dec 8th, 2008, 5:05am » Quote Modify

Just dealing with the total number of samples needed, (assuming nothing too funny happens with the processing of samples in the hospital once that gets cleared up), then we only gain information about the infected status of an individual person when either: that person's sampled in a batch of people that are tested positive OR that person is tested individually negative. To avoid testing everyone individually, our only hope to be faster is to perform batch tests of people and hope to clear large groups of people as free of disease at once before we resort to individual testing among a more concentrated pool.

If we test a batch of k people when the concentration of infected people is p, then the probability that they are cleared is (1-p)^k. the expected number cleared is k * (1-p) ^ k if k > 1. With a little calculus and algebra, we see this is maximized when k = -1/ln(1-p) ~ 1/p for small p. k as a function of p is a decreasing, concave up function until p = 1 - e^-2 (about .864), which tells us that greedily trying to maximize k until this point is probably is the right thing to do. After p = .3 or so, that expected value drops to below 1, and we want to start testing people individually anyways.

So the idea is to repeatedly:

1) Calculate the proportion p of remaining people in the population that are infected.
2) if this is below p = 1 - e^(-e^-1) (about .3) then test a population of size about k = -1/ln(1-p). If its above that value, then test single individuals.
3) If the group has at least one infected person, throw them back in the pool. If they are clear, set them aside.
3) repeat until everyone is tested.

I'm sure optimizations can be made with respect to rounding, and perhaps even with what to do with the people who test positive.

As for the total number of samples needed for this method: a lot. This will be a little annoying to calculate at best, and I will do it later, provided I am able to.
 « Last Edit: Dec 8th, 2008, 5:07am by teekyman » IP Logged
rmsgrey
Uberpuzzler

Gender:
Posts: 2844
 Re: infectious disease   « Reply #12 on: Dec 8th, 2008, 12:16pm » Quote Modify

A naive solution (which I'm not even going to bother working out expected runtimes for):

For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log2n tests
 IP Logged
River Phoenix
Junior Member

Gender:
Posts: 125
 Re: infectious disease   « Reply #13 on: Dec 8th, 2008, 3:08pm » Quote Modify

on Dec 8th, 2008, 12:16pm, rmsgrey wrote:
 A naive solution (which I'm not even going to bother working out expected runtimes for):   For any infected sample, divide into two equal samples and test each separately. This will find k infected people in an initial sample of n in less than 2k log2n tests

Suppose that we know that at least 1 from a sample of n people is infected:

Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances.

Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1 - 0.99^n) * 2 f(n)

Clearly f(2) = 2. It follows that:

f(2^n) = 2^n [1-0.99^2][1-0.99^4]...[1-0.99^(2^(n-1))]
= 2^n (1-0.99)^(n-1) [1+0.99][1+0.99^2]...[1+0.99^(2^(n-2))]
I believe that (1+a)(1+a^2)... converges, hence:

f(2^n) < C (0.02)^n

f(n) < C (0.02)^log2n

So the probabilistic order of growth seems like it is actually very good.
 « Last Edit: Dec 8th, 2008, 4:49pm by River Phoenix » IP Logged
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #14 on: Dec 8th, 2008, 10:19pm » Quote Modify

rmsgrey and Phoenix are on the right track .However, so far no answer is correct.
You all make this one seem more complicated than it must be...
Let solve this extracted version then you can find the solution for the original one:
there are 100 people in the town and k infected people among them (1<=k<=100)
prove that : the number of tests required <=7k
(sorry if i made a mistake again, cause this is the first time i created a puzzle >_<)
 IP Logged
River Phoenix
Junior Member

Gender:
Posts: 125
 Re: infectious disease   « Reply #15 on: Dec 8th, 2008, 11:52pm » Quote Modify

My calculation must be wrong by the way, the end equation is decreasingb
 IP Logged
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: infectious disease   « Reply #16 on: Dec 9th, 2008, 1:34am » Quote Modify

on Dec 8th, 2008, 10:19pm, tomas wrote:
 Let solve this extracted version then you can find the solution for the original one: there are 100 people in the town and k infected people among them (1<=k<=100) prove that : the number of tests required <=7k
log2 C(100,k)  7k
 « Last Edit: Dec 9th, 2008, 1:34am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
tomas
Newbie

Posts: 8
 Re: infectious disease   « Reply #17 on: Dec 9th, 2008, 2:04am » Quote Modify

Towr, of course ur inequation is correct
but let's try that with k=1
[log2C(1,100)]=6
pls tell me a strategy that work with 6 times to find out that 1 infected person ?
 hidden: ( i think the optimal solution is 7 here )

denote the times required =f
7k is just a general, loose sup value
ex :
k=1, f=7
k=2, f=13(not 14)
 IP Logged
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: infectious disease   « Reply #18 on: Dec 9th, 2008, 2:12am » Quote Modify

on Dec 9th, 2008, 2:04am, tomas wrote:
 Towr, of course ur inequation is correct but let's try that with k=1   [log2C(1,100)]=6
I get 7. x means you round x up. ~6.6 rounds up to 7.

Quote:
 hidden: ( i think the optimal solution is 7 here )
indeed it is.

However, the key question is, do we know k beforehand, or do we need to find it, also?
 « Last Edit: Dec 9th, 2008, 2:13am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
River Phoenix
Junior Member

Gender:
Posts: 125
 Re: infectious disease   « Reply #19 on: Dec 9th, 2008, 3:59am » Quote Modify

on Dec 8th, 2008, 3:08pm, River Phoenix wrote:
 Suppose that we know that at least 1 from a sample of n people is infected:   Split in half, test each group. What is the probability that one of the groups is clean? I believe it is simply 2*0.99^(n/2). Then, the problem reduces to 1 instance of the the n/2 case. Otherwise, it reduces to 2 instances.   Let f(n) be the number of trials required for a sample of size n. Then f(2n) = (1 - 0.99^n) * 2 f(n)   Clearly f(2) = 2. It follows that:

Actually (1-0.99^n) goes to 1, so if you just take it to be the case that f(2n) < 2 f(n), then f(n) < n. Which is obvious.. so I think this whole strategy is a deadend. I think the reason is that in the steady state as n-> infinity and you start subdividing, both halves always have at least one infected person, so it's no better order of growth wise than checking all people separately.

As towr notes, you can do it in log2C(n,k) given that you know k (via binary search over all possible sets of k).

There is an obvious duality between this and the strategy I analyzed which ends up taking n steps: the sum of log2C(n,k) over all k equals n. Since they both use binary search, they are essentially the same strategy. The cleverness is to separate out the instances by k, but in sum we still haven't really beaten the naive strategy of checking each person separately.

This is not that insightful but I think interesting.
 « Last Edit: Dec 9th, 2008, 4:01am by River Phoenix » IP Logged
 Pages: 1 Reply Notify of replies Send Topic Print

 Forum Jump: ----------------------------- riddles -----------------------------  - easy   - medium => hard   - what am i   - what happened   - microsoft   - cs   - putnam exam (pure math)   - suggestions, help, and FAQ   - general problem-solving / chatting / whatever ----------------------------- general -----------------------------  - guestbook   - truth   - complex analysis   - wanted   - psychology   - chinese « Previous topic | Next topic »