wu :: forums
« wu :: forums - Uniform sampling »

Welcome, Guest. Please Login or Register.
May 10th, 2024, 1:42pm

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: Grimbal, towr, Icarus, SMQ, ThudnBlunder, Eigenray, william wu)
   Uniform sampling
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Uniform sampling  (Read 1126 times)
Terps.Go
Newbie
*





   


Gender: male
Posts: 13
Uniform sampling  
« on: Jan 23rd, 2005, 8:20pm »
Quote Quote Modify Modify

Google interview:
We have real-time data coming in and we don't know the probability distribution of the coming data. Derive a algorithm to uniformly sample the coming data.
IP Logged
puzzlecracker
Senior Riddler
****



Men have become the tools of their tools

   


Gender: male
Posts: 319
Re: Uniform sampling  
« Reply #1 on: Jan 23rd, 2005, 9:47pm »
Quote Quote Modify Modify

hash every data-piece in the incoming dataset to a number and increment if identical item comes in. you can also strip words for having the same root as well as eliminate the stop words (ex. the,a, an,etc.).  This way you can build a lerning model. and augment for each dataset.   After you determine that the model is acceptable, you can sample the relevance of date base  on hash.  
 
 so that you should know: most of the algorithms in Google are based on some sort of hash.
IP Logged

While we are postponing, life speeds by
Terps.Go
Newbie
*





   


Gender: male
Posts: 13
Re: Uniform sampling  
« Reply #2 on: Jan 24th, 2005, 4:29am »
Quote Quote Modify Modify

That's my first solution. Does't work since they want uniformly sampling in time, not in value. If tyou hash by value, you get uniform by value.
Thanks anyway!
IP Logged
sssnoozze
Guest

Email

Re: Uniform sampling  
« Reply #3 on: Jan 26th, 2005, 5:10pm »
Quote Quote Modify Modify Remove Remove

I don't get, do I -- start a timer, when it expires, take a sample, restart the timer?
IP Logged
Terps.Go
Newbie
*





   


Gender: male
Posts: 13
Re: Uniform sampling  
« Reply #4 on: Jan 26th, 2005, 7:17pm »
Quote Quote Modify Modify

Hi sssnoozze,
 
Nope! Since the incoming dataset is not uniform. If you start timer and do sampling -> you will not get a uniform representation of the data set.
IP Logged
puzzlecracker
Senior Riddler
****



Men have become the tools of their tools

   


Gender: male
Posts: 319
Re: Uniform sampling  
« Reply #5 on: Jan 26th, 2005, 10:07pm »
Quote Quote Modify Modify

maybe you can make it work by randomly assigning priority to every data in the date set.. will that work? (where is towr with his kick-butt solutions)
IP Logged

While we are postponing, life speeds by
sssnoozze
Guest

Email

Re: Uniform sampling  
« Reply #6 on: Jan 27th, 2005, 8:50am »
Quote Quote Modify Modify Remove Remove

on Jan 26th, 2005, 7:17pm, Terps.Go wrote:
Hi sssnoozze,
 
Nope! Since the incoming dataset is not uniform. If you start timer and do sampling -> you will not get a uniform representation of the data set.

 
I always thought that uniform sampling was sampling with constant interval.  Did you mean something else?
 
Can you please restate what the problem is?
IP Logged
Terps.Go
Newbie
*





   


Gender: male
Posts: 13
Re: Uniform sampling  
« Reply #7 on: Jan 27th, 2005, 1:51pm »
Quote Quote Modify Modify

Hi,
 
Let me explain a little bit. For example the flow of dataset is comming with different speed. Sometimes it can be fast, sometimes slow. You need to derive an algorithm to get a representative sample of the dataset
IP Logged
gniv
Newbie
*





   


Gender: male
Posts: 19
Re: Uniform sampling  
« Reply #8 on: Feb 3rd, 2005, 11:21am »
Quote Quote Modify Modify

If you want to find the answer and don't mind reading a bit, search for "sampling with reservoir".  
 
Hint: :: Keep a counter with the number of items you have seen so far. ::
« Last Edit: Feb 8th, 2005, 1:02am by gniv » IP Logged
John_Gaughan
Uberpuzzler
*****



Behold, the power of cheese!

5187759 5187759   john23874   SnowmanJTG
WWW Email

Gender: male
Posts: 767
Re: Uniform sampling  
« Reply #9 on: Feb 4th, 2005, 7:06am »
Quote Quote Modify Modify

I think Dr. Knuth has a solution to this in "Searching and Sorting", one of the volumes in the Art of Computer Programming.
IP Logged

x = (0x2B | ~0x2B)
x == the_question
gniv
Newbie
*





   


Gender: male
Posts: 19
Re: Uniform sampling  
« Reply #10 on: Feb 8th, 2005, 1:01am »
Quote Quote Modify Modify

My solution:
::
Assume for now that we want a sample of size 1 (one item).  
When the item number t comes, choose it as the random sample with probability 1/t. Otherwise, keep the sample that you had.
 
Justification: Consider the first n items in the stream of data: x1, x2, ... xn. The probability that the random sample x at this point in time is xi is
P(x=xi) = 1/i [cdot] i/(i+1) [cdot] (i+1)/(i+2) [cdot] ... [cdot] (n-1)/n = 1/n, so the sample is random.
 
To keep an array of k samples, at step t choose a random item from the sample array and replace it with the current item from the stream with probability k/t (after step k, of course).
::  
 
PS: Terps.Go: Did you interview at Google? What else were you asked?
« Last Edit: Feb 8th, 2005, 1:05am by gniv » IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board