Author |
Topic: Uniform sampling (Read 1126 times) |
|
Terps.Go
Newbie
Gender:
Posts: 13
|
|
Uniform sampling
« on: Jan 23rd, 2005, 8:20pm » |
Quote Modify
|
Google interview: We have real-time data coming in and we don't know the probability distribution of the coming data. Derive a algorithm to uniformly sample the coming data.
|
|
IP Logged |
|
|
|
puzzlecracker
Senior Riddler
Men have become the tools of their tools
Gender:
Posts: 319
|
|
Re: Uniform sampling
« Reply #1 on: Jan 23rd, 2005, 9:47pm » |
Quote Modify
|
hash every data-piece in the incoming dataset to a number and increment if identical item comes in. you can also strip words for having the same root as well as eliminate the stop words (ex. the,a, an,etc.). This way you can build a lerning model. and augment for each dataset. After you determine that the model is acceptable, you can sample the relevance of date base on hash. so that you should know: most of the algorithms in Google are based on some sort of hash.
|
|
IP Logged |
While we are postponing, life speeds by
|
|
|
Terps.Go
Newbie
Gender:
Posts: 13
|
|
Re: Uniform sampling
« Reply #2 on: Jan 24th, 2005, 4:29am » |
Quote Modify
|
That's my first solution. Does't work since they want uniformly sampling in time, not in value. If tyou hash by value, you get uniform by value. Thanks anyway!
|
|
IP Logged |
|
|
|
sssnoozze
Guest
|
I don't get, do I -- start a timer, when it expires, take a sample, restart the timer?
|
|
IP Logged |
|
|
|
Terps.Go
Newbie
Gender:
Posts: 13
|
|
Re: Uniform sampling
« Reply #4 on: Jan 26th, 2005, 7:17pm » |
Quote Modify
|
Hi sssnoozze, Nope! Since the incoming dataset is not uniform. If you start timer and do sampling -> you will not get a uniform representation of the data set.
|
|
IP Logged |
|
|
|
puzzlecracker
Senior Riddler
Men have become the tools of their tools
Gender:
Posts: 319
|
|
Re: Uniform sampling
« Reply #5 on: Jan 26th, 2005, 10:07pm » |
Quote Modify
|
maybe you can make it work by randomly assigning priority to every data in the date set.. will that work? (where is towr with his kick-butt solutions)
|
|
IP Logged |
While we are postponing, life speeds by
|
|
|
sssnoozze
Guest
|
on Jan 26th, 2005, 7:17pm, Terps.Go wrote:Hi sssnoozze, Nope! Since the incoming dataset is not uniform. If you start timer and do sampling -> you will not get a uniform representation of the data set. |
| I always thought that uniform sampling was sampling with constant interval. Did you mean something else? Can you please restate what the problem is?
|
|
IP Logged |
|
|
|
Terps.Go
Newbie
Gender:
Posts: 13
|
|
Re: Uniform sampling
« Reply #7 on: Jan 27th, 2005, 1:51pm » |
Quote Modify
|
Hi, Let me explain a little bit. For example the flow of dataset is comming with different speed. Sometimes it can be fast, sometimes slow. You need to derive an algorithm to get a representative sample of the dataset
|
|
IP Logged |
|
|
|
gniv
Newbie
Gender:
Posts: 19
|
|
Re: Uniform sampling
« Reply #8 on: Feb 3rd, 2005, 11:21am » |
Quote Modify
|
If you want to find the answer and don't mind reading a bit, search for "sampling with reservoir". Hint: :: Keep a counter with the number of items you have seen so far. ::
|
« Last Edit: Feb 8th, 2005, 1:02am by gniv » |
IP Logged |
|
|
|
John_Gaughan
Uberpuzzler
Behold, the power of cheese!
Gender:
Posts: 767
|
|
Re: Uniform sampling
« Reply #9 on: Feb 4th, 2005, 7:06am » |
Quote Modify
|
I think Dr. Knuth has a solution to this in "Searching and Sorting", one of the volumes in the Art of Computer Programming.
|
|
IP Logged |
x = (0x2B | ~0x2B) x == the_question
|
|
|
gniv
Newbie
Gender:
Posts: 19
|
|
Re: Uniform sampling
« Reply #10 on: Feb 8th, 2005, 1:01am » |
Quote Modify
|
My solution: :: Assume for now that we want a sample of size 1 (one item). When the item number t comes, choose it as the random sample with probability 1/t. Otherwise, keep the sample that you had. Justification: Consider the first n items in the stream of data: x1, x2, ... xn. The probability that the random sample x at this point in time is xi is P(x=xi) = 1/i [cdot] i/(i+1) [cdot] (i+1)/(i+2) [cdot] ... [cdot] (n-1)/n = 1/n, so the sample is random. To keep an array of k samples, at step t choose a random item from the sample array and replace it with the current item from the stream with probability k/t (after step k, of course). :: PS: Terps.Go: Did you interview at Google? What else were you asked?
|
« Last Edit: Feb 8th, 2005, 1:05am by gniv » |
IP Logged |
|
|
|
|