wu :: forums
« wu :: forums - The hardest interview question I met so far »

Welcome, Guest. Please Login or Register.
Mar 28th, 2024, 4:02pm

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: william wu, Eigenray, SMQ, Icarus, towr, ThudnBlunder, Grimbal)
   The hardest interview question I met so far
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: The hardest interview question I met so far  (Read 37616 times)
mathews1978
Newbie
*





   


Posts: 3
Re: The hardest interview question I met so far  
« Reply #25 on: Mar 18th, 2006, 7:04pm »
Quote Quote Modify Modify

How about this solution ?
 
_________________________________________
 
#include<iostream>
using std::cin;;
using std::cout;
 
int main()
{
 
int A[10];
 
int B[10];
 
bool atraverse = true;
 
int sumToBeatAIndex=0;
 
int sumToBeatBIndex=1;
 
int j=1;
 
int k=0;
 
int i;
 
int sumtobeat;
 
int n;
 
int tempBIndex;
 
int tempAIndex;
 
 
cout<<"\nEnter the size\n";
 
cin>>n;
 
cout<<"\nEnter the numbers in A (sorted)\n";
 
for(i=0;i<n;i++)
 
 
cin>>A[i];
 
cout<<"\nEnter the numbers in B (sorted)\n";
 
for(i=0;i<n;i++)
 
 
cin>>B[i];
 
 
cout<<"("<<B[0]<<","<<A[0]<<") ";
 
sumtobeat=A[0]+B[1];
 
for(i=1;i<n;i++)
 
{
 
 
if(B[k]+A[j]>sumtobeat)
 
 
 
 
cout<<"("<<B[k]<<","<<A[j]<<") ";
 
 
else
 
 
{
 
 
 
sumtobeat=B[k]+A[j];
 
 
 
tempBIndex=k;
 
 
 
tempAIndex=j;
 
 
 
j=sumToBeatAIndex;
 
 
 
k=sumToBeatBIndex;
 
 
 
sumToBeatAIndex=tempAIndex;
 
 
 
sumToBeatBIndex=tempBIndex;
 
 
 
cout<<"("<<B[k]<<","<<A[j]<<") ";
 
 
 
atraverse=!(atraverse);
 
 
}
 
 
if(atraverse)
 
 
 
j++;
 
 
else
 
 
 
k++;
 
}
 
cout<<"\n";
 
return 0;
}
 
_________________________________________
IP Logged
ChunkTug
Full Member
***






   


Gender: male
Posts: 166
Re: The hardest interview question I met so far  
« Reply #26 on: Mar 19th, 2006, 3:27am »
Quote Quote Modify Modify

Consider the case:
A = {2,1,0}
B = {2,1,0}
 
The desired pairs are:
(2,2) (2,1) (1,2)
 
Your algorithm won't add both (2,1) and (1,2). At each step either j or k is incremented and/or swapped. The sum j+k increases by one each iteration.
 
The only cases where this algorithm will produce the correct answer are when the solution is:
  {(A[0],B[0]), (A[0],B[1]), ... , (A[0],B[N-1])}
  ...or...
  {(A[0],B[0]), (A[1],B[0]), ... , (A[N-1],B[0])}  
 
« Last Edit: Mar 19th, 2006, 3:33am by ChunkTug » IP Logged
mathews1978
Newbie
*





   


Posts: 3
Re: The hardest interview question I met so far  
« Reply #27 on: Mar 19th, 2006, 10:15am »
Quote Quote Modify Modify

Try compiling and running this. It will give you the correct answer for the case you mentioned. If you look at the code j and k are not always incremented. Either j or k is incremented - but not both - during each pass through the loop. So you will get the correct pairs. Just copy the code to visual studio compile and run and see.
IP Logged
mathews1978
Newbie
*





   


Posts: 3
Re: The hardest interview question I met so far  
« Reply #28 on: Mar 19th, 2006, 10:20am »
Quote Quote Modify Modify

And I am not incrementing j or k in each pass through the loop. If you look at the 1st else block inside the loop I am making the value of j and k, sumtobeatAIndex and sumtobeatBIndex. They could be less than the current value of j and k. So j and k always dont get incremented.
IP Logged
ChunkTug
Full Member
***






   


Gender: male
Posts: 166
Re: The hardest interview question I met so far  
« Reply #29 on: Mar 19th, 2006, 4:37pm »
Quote Quote Modify Modify

Yeah, I was misreading it. However, there is a problem with the traversal. It will only add pairs where either j=0 or k=0.
 
A=5,4,1,0
B=5,3,1,0
 
Desired output:
(5,5) (5,4) (3,5) (4,3)
 
Program's output:
(5,5) (5,4) (3,5) (5,1)
 
If you were to plot AxB on a grid the possible solutions are a "stairstep"-like pattern. A problem arises when you find the next highest pair that the list of candidates for the next pair can grow linearly. The total number of solutions (collections of pairs) for a given N is equal to P(N), The Partition Function P. Each solution can be put into a 1-1 mapping with a partition of N (eg. The partition of 6 into 3+2+1 would give 3 pairs from the first column, 2 from the second, 1 from the third). See the image below for the possible solutions when N=6.
 

 
I suspect that an O(N) solution does not exist for this problem. If I recall correctly a randomized linear solution was proposed KWillets & Jelani, but was incomplete. Also, fanduboy proposed an algorithm he believed was O(N) that no one could refute because we couldn't tell what the frack was going on  Tongue
« Last Edit: Mar 19th, 2006, 4:39pm by ChunkTug » IP Logged
KWillets
Junior Member
**





   


Posts: 84
Re: The hardest interview question I met so far  
« Reply #30 on: Mar 21st, 2006, 10:04am »
Quote Quote Modify Modify

Thanks for the link to P(n).  From the estimates it looks like the output size is still O(n) bits, but we don't have any further insight on the inner workings.
 
I was working on some code to test it empirically, using a probabilistic approach, but I haven't had time to finish it.  The approach was to move upper and lower stairstep boundaries towards the correct one, picking a new pivot point within the bounds each iteration.
IP Logged
Ivan
Junior Member
**





   


Gender: male
Posts: 70
Re: The hardest interview question I met so far  
« Reply #31 on: Apr 15th, 2006, 4:53pm »
Quote Quote Modify Modify

Hmm. I agree. I think that the point is we only need to give out the n pairs, NOT NECESSARILY in any order.  
 
BTW, I am really happy to see people interested in this problem. Smiley
IP Logged
delphos
Newbie
*





   


Posts: 2
Re: The hardest interview question I met so far  
« Reply #32 on: Apr 18th, 2006, 7:42am »
Quote Quote Modify Modify

Please consider my solution:
 
Code:
public void Calculate() {
    int ai = 0;
    int bi = 0;
    int at = 0;
    int bt = 0;
 
    s[0] = a[ai] + b[bi];
 
    for (int i = 1; i < s.Length; i++) {
        if (a[ai+1] + b[bt] > b[bi+1] + a[at])
            s[i] = a[ai+1] + b[bt++];
        else
            s[i] = b[bi+1] + a[at++];
 
        if (at > ai) {
            bi++;
            at = 0;
        }
 
        if (bt > bi) {
            ai++;
            bt = 0;
        }
    }
}
« Last Edit: Apr 19th, 2006, 1:43am by delphos » IP Logged
srinath
Newbie
*





   


Gender: male
Posts: 4
Re: The hardest interview question I met so far  
« Reply #33 on: Jun 2nd, 2006, 10:31am »
Quote Quote Modify Modify

i suppose delphos' solution is of  o(n) and  probably the required  
one....if not please specify the flaw in it...
IP Logged
SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #34 on: Jun 2nd, 2006, 11:33am »
Quote Quote Modify Modify

See ChunkTug's post above -- delphos' solution fails in the same way:
 
s[0] = 5 + 5; ai = 0; bi = 0; at = 0; bt = 0
s[1] = 4 + 5; ai = 1; bi = 0; at = 0; bt = 0
s[2] = 5 + 3; ai = 1; bi = 1; at = 0; bt = 0
s[3] = 1 + 5
 
where s[3] should be 3 + 4, but that sum is not considered by the comparison
 
--SMQ
IP Logged

--SMQ

srinath
Newbie
*





   


Gender: male
Posts: 4
Re: The hardest interview question I met so far  
« Reply #35 on: Jun 2nd, 2006, 11:55am »
Quote Quote Modify Modify

i think SMQ's simulation is wrong at s[2]...
it is actually s[2]=5+3;ai=1;bi=0;at=1;bt=0;
so...
s[3] = 4+3;ai=1;bi=1;at=0;bt=0;
this indeed seems to be a good solution....the point is only two elements are compared and the elements to be compared are kept track of using the counters 'at' and 'bt'....
...Correct me if i'm wrong...
IP Logged
SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #36 on: Jun 2nd, 2006, 1:38pm »
Quote Quote Modify Modify

Embarassed That's what I get for executing code in my head instead of on a computer.
 
I'm still fairly certain it will fail in a similar manner -- where the two elements it's considering don't include the highest sum -- let me look for a failure case and I'll get back to you...
 
--SMQ
IP Logged

--SMQ

SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #37 on: Jun 2nd, 2006, 2:24pm »
Quote Quote Modify Modify

OK, take the same example ChunkTug gave:
 
A = {5, 4, 1, 0 }
B = {5, 3, 1, 0 }
 
but look at the first nine elements instead of just the first four:
 
s[0] = 5 + 5; ai = 0; bi = 0; at = 0; bt = 0
s[1] = 4 + 5; ai = 1; bi = 0; at = 0; bt = 0
s[2] = 5 + 3; ai = 1; bi = 0; at = 1; bt = 0
s[3] = 4 + 3; ai = 1; bi = 1; at = 0; bt = 0
s[4] = 5 + 1; ai = 1; bi = 1; at = 1; bt = 0
s[5] = 1 + 5; ai = 1; bi = 1; at = 1; bt = 1
s[6] = 4 + 1; ai = 1; bi = 2; at = 0; bt = 1
s[7] = 5 + 0; ai = 1; bi = 2; at = 1; bt = 1
s[8] = 4 + 0
 
s[8] should have been 0 + 5, but that sum wasn't considered by the comparison.
 
--SMQ
IP Logged

--SMQ

srinath
Newbie
*





   


Gender: male
Posts: 4
Re: The hardest interview question I met so far  
« Reply #38 on: Jun 7th, 2006, 11:35pm »
Quote Quote Modify Modify

SMQ is correct to the point that there  is a flaw in the delphos' solution...that is because he considered only the condition of the two pointer pairs (ai,at),(bi,bt) crossing each other..but there is another constraint which he failed to check..i.e the movement of the pointer is based on the relative difference between the elements also...thus this code would solve the problem in delphos' solution....
here da[ ] and db[ ] are the difference encoded arrays and the last element in this array is initialised to +infinityi.e integer limit....
 
printf("%d %d\n",aa[0],bb[0]);  
  for(i=1;i<2*n+4;i++)
  {
    if(aa[at]+bb[bi]>bb[bt]+aa[ai])
    {
  printf("%d %d\n",aa[at],bb[bi]);
  if(at==bt&&bi==ai&&bi!=n-1&&at!=0)
   {at=bt=0;bi++;ai++;}
  else{
  if(at<bi&&da[at]<=db[bi])
    at++;
  else
   {at=0;bi++;}  
     }
    }
    else
    {
 printf("%d %d\n",aa[ai],bb[bt]);  
 if(at==bt&&bi==ai&&bi!=n-1&&at!=0)
  {at=bt=0;bi++;ai++;}
 else{  
 if(bt<ai&&db[bt]<=da[ai])  
   bt++;
 else
 {bt=0;ai++;}    
     }
    }
 
  }
This code though solves the problem in delphos' solution ,it has  its own flaws...i.e the flaw in this code is that the pointer bt is reverted to 0 when the relative defference i.e da[ai] is less than that in db[bt] ...this is fine when bt=ai...when bt<ai...
there some more combinations of (bt+1...ai) and ai which could result in a larger value in latter part of the problem....this can be taken care of by using a queue to store the value of bt and ai whenever there is a need to revert bt back to 1 ....then when there is a need to increment bt at a latter stage ,this previous state can be restored and allowed to proceed till bt=ai....then the next stored state from the queue and so on.....
this procedure applies for both (bt,ai)and (at,bi).....i'll code it and upload it soon...correct me if i'm wrong or i'm not clear any where...
IP Logged
srinath
Newbie
*





   


Gender: male
Posts: 4
Re: The hardest interview question I met so far  
« Reply #39 on: Jun 7th, 2006, 11:39pm »
Quote Quote Modify Modify

you may avoid bothering about  
 
if(at==bt&&bi==ai&&bi!=n-1&&at!=0)
   {at=bt=0;bi++;ai++;}  
after the two printf as they are to avoid the same elements being considered twice...
IP Logged
Brian
Newbie
*





   


Posts: 4
Re: The hardest interview question I met so far  
« Reply #40 on: Jul 10th, 2006, 12:46pm »
Quote Quote Modify Modify

Description of solution:
 

1. Pretend like we are doing an O(N^2) loop: One loop iterates through all elements of A, where an inner loop iterates through all elements of B.  Short circuit the inner loop if it turns out we'd get a higher value by jumping to the next iteration of the outer loop.  
 
2. Make two such crawls in parallel - the other crawl reverses the loops.
 
3. Get N results from the crawl.  Because the crawls may overlap, running time is 2N, but of course this is still O(N).

 
 
Code:

# Python source
 
a=[5,4,1,0]
b=[5,3,1,0]
 
class acc:
 
  def __init__(self,ain,bin):
    self.a = ain
    self.b = bin
    self.ai=0
    self.bi=0
 
  def getval(self):
    return self.a[self.ai]+self.b[self.bi]
 
  def crawlAB(self):
    friendly=str(self.a[self.ai])+","+str(self.b[self.bi])
    if self.bi+1==len(self.b) or self.a[self.ai+1]+self.b[0]>self.a[self.ai]+self.b[self.bi+1]:
 self.ai=self.ai+1
 self.bi=0
    else:
 self.bi=self.bi+1
    return friendly
   
  def crawlBA(self):
    friendly=str(self.a[self.ai])+","+str(self.b[self.bi])
    if self.ai+1==len(self.a) or self.b[self.bi+1]+self.a[0]>self.b[self.bi]+self.a[self.ai+1]:
 self.bi=self.bi+1
 self.ai=0
    else:
 self.ai=self.ai+1
    return friendly  
 
acc1 = acc(a,b)
acc2 = acc(a,b)
printed = 0
visited={}
 
while printed<len(a):
  friendly = ""
  if acc1.getval()>=acc2.getval():
    friendly = acc1.crawlAB()
  else:
    friendly = acc2.crawlBA()
  if friendly not in visited:
    print friendly
    visited[friendly]=True
    printed = printed + 1
IP Logged
SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #41 on: Jul 10th, 2006, 4:17pm »
Quote Quote Modify Modify

on Jul 10th, 2006, 12:46pm, Brian wrote:
Short circuit the inner loop if it turns out we'd get a higher value by jumping to the next iteration of the outer loop.

An interesting approach -- too bad it doesn't work. ;)
 
Consider a = b = {5, 4, 3, 2, 1}.  Your program returns
(5,5) (5,4) (4,5) (4,4) (3,5) (5,3) (3,4) (2,5) (4,3) (5,2) (2,4) (1,5) (4,2) (5,1) (1,4) ...
 
Oops: missed (3,3) because neither loop was considering it.  Plus, it looks like checking for pairs you've already seen (visited) takes worst-case either N2 extra memory or log2N time for each check, pushing you past O(N) overall.
 
--SMQ
IP Logged

--SMQ

Brian
Newbie
*





   


Posts: 4
Re: The hardest interview question I met so far  
« Reply #42 on: Jul 10th, 2006, 4:42pm »
Quote Quote Modify Modify

Quote:
Consider a = b = {5, 4, 3, 2, 1}.  Your program returns
(5,5) (5,4) (4,5) (4,4) (3,5) (5,3) (3,4) (2,5) (4,3) (5,2) (2,4) (1,5) (4,2) (5,1) (1,4) ...

Hrm?  I see the output as:
5,5 5,4 4,5 4,4 3,5
 
Note the original problem asks for the top N pairs, where N is the length of A and B.  3,3 shouldn't be in the list of the top 5 here (you can see yourself that all of the above have a sum greater then 6).
 
Quote:
Plus, it looks like checking for pairs you've already seen (visited) takes worst-case either N2 extra memory or log2N time for each check, pushing you past O(N) overall.

 
visited is a dictionary (i.e. hashmap).  insert and search are O(1) operations, and I do O(N) of such operations.  It also uses only O(N) extra memory.
IP Logged
SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #43 on: Jul 10th, 2006, 5:09pm »
Quote Quote Modify Modify

Hrm, I guess the problem statement does use the same N for the size of the arrays and the size of the results.  OK, try a = b = {5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} Wink
 
As for the hashmap, worst case insert and/or search are still O(log2N) (or worse, depending on the implementation.)
 
--SMQ
IP Logged

--SMQ

Brian
Newbie
*





   


Posts: 4
Re: The hardest interview question I met so far  
« Reply #44 on: Jul 10th, 2006, 6:27pm »
Quote Quote Modify Modify

Quote:
Hrm, I guess the problem statement does use the same N for the size of the arrays and the size of the results.  OK, try a = b = {5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

A and B are sets - they can't contain duplicate items Smiley
 
Quote:
As for the hashmap, worst case insert and/or search are still O(log2N) (or worse, depending on the implementation.)

Not if you use a perfect hash table.  In any event, you could use a N^2'd sized array.  Yes, a lot of extra storage, but I think it solves the problem.
IP Logged
SMQ
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 2084
Re: The hardest interview question I met so far  
« Reply #45 on: Jul 10th, 2006, 8:26pm »
Quote Quote Modify Modify

on Jul 10th, 2006, 6:27pm, Brian wrote:
A and B are sets - they can't contain duplicate items Smiley

And you, sir, are missing the point. Smiley
a = b = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1} still exhibits the problem.
 
15,15 15,14 14,15 14,14 13,15 15,13 13,14 12,15 14,13 15,12 12,14 11,15 14,12 15,11 11,14 ... whoops, where is 13,13?
 
My point is that given a sufficiently large, properly constructed input, there are areas of the A x B matrix which neither crawlAB nor crawlBA traverse.
 
--SMQ
IP Logged

--SMQ

Brian
Newbie
*





   


Posts: 4
Re: The hardest interview question I met so far  
« Reply #46 on: Jul 12th, 2006, 3:24pm »
Quote Quote Modify Modify

This is a fun forum.
 
Ok, what about if, when we short-circuited the inner loop:
 
Instead of discarding the rest of the choices, push the pair onto a stack.  Then, everytime we consider a pair from our original algorithm, we check to see if the value on top of the stack is greater.  If so, we print that (and also continue checking the rest of the pairs from the original loop.  If we get a number smaller then the value from the original value, push THAT pair back onto the stack).
 
Checking the value of the stacks top value is fast, and doesn't make the whole thing slower then O(N).  The only problem is making sure we don't end up pushing everything into the array.
 
May not work.
IP Logged
James Fingas
Uberpuzzler
*****





   
Email

Gender: male
Posts: 949
Re: The hardest interview question I met so far  
« Reply #47 on: Jul 25th, 2006, 6:15am »
Quote Quote Modify Modify

Done!
 
This is a pretty good puzzle. You can actually find the cutoff and a description of exactly which pairs are in the solution in less than O(N) time, but outputting all the solutions takes O(N) time, so it doesn't help you overall. This won't be much of a hint, but finding the cutoff point can be done in O(sqrt(N)*log2N) time. Forgive me for not writing code, but this gets pretty nitty-gritty in spots.
 
I think of the problem as a two-dimensional array, S(i,j), of numbers formed by the addition S(i,j) = A[i] + B[j]. A cutoff value C defines a cutoff line through the array so that above the cutoff line S(i,j) >= C and below the line S(i,j) <= C. The particular value C0 that we want gives S(i,j) >= C0 for at least N entries in the array, and S(i,j) <= C0 for at least N(N-1) entries in the array.
 
Here is a better hint, posed as a question:
hidden:
Noting that the problem is to find the partition that contains the highest N entries in the square, devise a method for describing a generic partition containing N items that takes less than O(N) memory.

 
Answer:
hidden:
We describe a partition as a set of "partition points" that specify the corners of the partition, such that the partition is exactly those S(i,j), so that for some partition point with coordinates i0,j0, i <= i0 and j <= j0. Note, however, that the value S(sqrt(N)+1,sqrt(N)+1) cannot be in the result, so the possible partitions are all skinny, spread along the i=1 and j=1 sides of the array. Therefore, we divide the partition into two pieces along the line i=j. Now both pieces of the partition have a height or width of at most sqrt(N). Therefore, we need to allocate space for only 2*sqrt(N) partition points.

 
Continuing:
hidden:
Now our general aim is to find the particular cutoff C0 such that the partition defined by C0 contains exactly 100 items (I'll assume all the items have different values, but no fundamental problem is introduced otherwise). First, note that any C0 must be less than A[1] + B[1], and must be greater than or equal to A[sqrt(N)+1] + B[sqrt(N)+1]. We therefore do a binary search on this range of values, to find C0.
 
As we do our binary search, we must determine how many items are above a trial cutoff C. Doing this is not particularly difficult. First, we move each partition point to the value in its row or column corresponding to the smallest sum larger than C. After all partition points are placed (which takes O(logN) per partition point, for O(sqrt(N)logN) total), we add up the number of elements larger than C in the entire array. This also takes O(sqrt(N)logN) time.
 
Now we just have to try this with various C until we get C0, our ideal value.

 
Now for a wrinkle:
hidden:
My first thought was that it would take O(logN) trials of C to get our ideal value, but that is not entirely true. The number of trials depends on the data type of the values we are comparing, and their distribution in that data type. Using integers, it takes a maximum of 16 steps to find a value with precision 1. But if A[i] and B[j] are real numbers, with large values but potentially small differences between them, it could take much longer. So the order of the naive algorithm is O(sqrt(N)logNlogK), where K is a measure of the granularity of the numbers.

 
A refinement:
hidden:
In order to have a proper binary search, so that we take O(logN) trial values of C, we must use values that are within the array. The way to do this is to keep track of more information. For each partition point, we will keep three values: one value is known to be too high, one value is known to be too low, and one value is our current guess (which will be proven too high or too low). Putting all the partitions together, we now have something like O(Nsqrt(N)) numbers. My initial thought was to choose a new C from the largest row in the array (corresponding to i=1). But what if we find that all the values in the largest row are either too large or too small? Then we must select a value from a different row or column. We might have to repeat this O(sqrt(N)) times, which would give us O(NlogN) performance overall.
 
So instead, we can use a median-of-medians approach by which we choose the median from each potential partition point range, then choose the median of those medians as a new trial C. This should give us fairly good performance. Choosing the median of each set can be done in constant time, and we need to choose O(sqrt(N)) medians, then find the median of those, which can be done by sorting then indexing, taking O(sqrt(N)log(sqrt(N)) = O(sqrt(N)logN) time, which is the same time it takes to evaluate the new C.

 
The second wrinkle:
hidden:
But another wrinkle crops up, because a median-of-median calculation guarantees performance by guaranteeing that a certain fraction of the total values are above the median, and a similar fraction are below the median. Because the potential partition point ranges have different sizes, we can't come up with a similar guarantee for this method. So we actually fail to ensure that we only need to choose O(logN) trial values for C.

 
A further refinement:
hidden:
One simple way to address this shortcoming is to store the size of the potential partition point range along with its median, sorting them as a pair, and then after sorting select a median based on the range sizes. This provides a guarantee of a minimum fraction of values both above and below the median. Now we know it will take only O(logN) trial values of C to find C0. This gives a total time of O(logN(sqrt(N)logN + sqrt(N)logN)) = O(sqrt(N)log2N) time to find C0. It then takes O(N + sqrt(N)) = O(N) to output the values.
IP Logged

Doc, I'm addicted to advice! What should I do?
KWillets
Junior Member
**





   


Posts: 84
Re: The hardest interview question I met so far  
« Reply #48 on: Jan 3rd, 2007, 1:42pm »
Quote Quote Modify Modify

I had some time over the holidays and actually coded a solution to this.  I wanted to do some empirical testing, and it works surprisingly well.  
 
The main points:
 

  • It finds the smallest n pair sums.
  • The solution is returned as a pivot value, less than which there are n pair sums.
  • The pivot is found by binary search between A[0]+B[0] and A[sqrt(n)] +B[sqrt(n)].  Each possible pivot is evaluated by a counting function.
  • The counting function returns as soon as more than n pairs are found below the pivot.
  • The counting function starts on the diagonal and works left and right columnwise.  Since there are only sqrt(n) plausible starting points there, this option seemed better than starting on an axis, but I haven't tested this assertion.

 
Testing on random data seems to show that it's sublinear (on average).  I haven't tried to figure out what bound it should have, but the average number of pairs visited seems to decline relative to n as n increases.  The number of pairs visited is also smaller than n above 1000 or so, declining to .2n around 100,000.  
 
I'll post the code soon; I don't have it with me at the moment.
IP Logged
KWillets
Junior Member
**





   


Posts: 84
Re: The hardest interview question I met so far  
« Reply #49 on: Jan 4th, 2007, 8:35am »
Quote Quote Modify Modify

Here's the code (OS X BSD, but hopefully portable).  Paradoxically, this is C++ (I started with several objects, then deleted them):
 
 
#include <iostream>
 
#include "stdlib.h"
#include "math.h"
 
//  
 
int binsearch_le( int* X, int n, int v )
{
int l = 0, u = n-1, tmp = 0;
 
if( X[u] < v ) return u;
 
while( X[l] != v && l < u-1 )
 
{
//
std::cout << "l: " << l << " u:" << u << "\n";
 
tmp = (l+u)>>1;
 
if( X[tmp] > v )
 
 
u = tmp;
 
else if( X[tmp] < v )
 
 
l = tmp;
 
else
 
 
return tmp;
 
};
 
return l;
}
 
// counter for pair tests in cnt; for testing  
int paircount;
#define COUNTIT {paircount++;}
#define RESETCOUNT {paircount = 0;}
#define OUTPUTCOUNT {std::cout << "\npaircount: " << paircount << "\n"; }
 
// the main operation to count:  add 2 elements and compare to pivot
int pairtest( int a, int b, int v )
{
COUNTIT;
return a+b <= v;
}
 
// count pair sums <= val
// d is start pos on diagonal, max <= val
// return count as soon as > n
int cnt( int val, int *A, int *B, int n, int d )
{
int i,j;
int c = 0;
// left
 
for( i = d, j = d; i >=0 && c <= n; i-- )
 
{
 
while( j+1 < n && pairtest(A[i], B[j+1], val)  )  // peek ahead
 
 
j++ ;
 
c += j+1;
// j is highest <= in column
 
};
 
for( i = d+1, j = d; j >= 0 && i < n && c <= n; i++ )
 
{
 
while( j >= 0 && !pairtest(A[i], B[j], val) )
 
 
j--;
 
c += j+1;
 
}
 
return c;
}
 
// returns pivot for n pair sums < pivot
int bottomn( int *A, int *B, int n  )
{
 
int i,j, a, b;
 
int l, u, lsum, usum, newsum, sqrtn, count =0;
int * diags = 0;
 
RESETCOUNT;
 
l = 0;
u = sqrtn = lround(ceil(sqrt(n)));
 
// pre-calc sums along diagonal up to sqrtn
diags = new int[sqrtn];
for( i = 0; i < sqrtn; i ++ )
 
diags[i] = A[i]+B[i];
 
lsum = A[ l ] + B[ l ];
usum = A[ u ] + B[ u ];
 
// loop:
// interpolate new pivot value in (lsum, usum) (note quadratic growth, d-count/d-pivot = C*pivot)
// find nearest (<=) diagonal pivot position
// find count outward from diag. pivot
// set lsum or usum to pivot based on count </> n
// stop when count = n (or smaller pivot would count < n)
while( lsum < usum -1 )
 
{
 
newsum = (lsum +usum) >> 1;
 
j = binsearch_le( diags, sqrtn, newsum );
 
count = cnt( newsum, A, B, n, j );
//
std::cout << "lsum: " << lsum << " usum: " << usum <<" newsum:" << newsum << " count: " << count << " j: " << j << "\n";
//
std::cout << "diags[j]: " << diags[j] << "\n";
 
if( count > n )
 
 
usum = newsum;
 
else if( count < n )
 
 
lsum = newsum;
 
else
 
 
break;
 
};
 
OUTPUTCOUNT;
 
return newsum;
}
 
int qsort_le(const void *a, const void *b)
{
 return *(int *)a - *(int *)b ;
}
 
int main (int argc, char * const argv[])  
{
int pivot;
    // insert code here...
 
int n;
int i=0;
int j=0;
int *A, *B;
int ntrials = 10;
 
int pairtot=0;
 
if( argc < 2) return -1;
 
n = atoi(argv[1]);
 
A = new int[n];
B = new int[n];
 
for( j = 0; j < ntrials; j++ )
 
{
 
for(i = 0; i < n; i++ )
 
 
{
 
 
A[i] = random() % 1000000000;
 
 
B[i] = random() % 1000000000;
 
 
};
 
 
qsort( A, n, sizeof( *A), &qsort_le );
 
qsort( B, n, sizeof( *B), &qsort_le );
 
 
pivot = bottomn( A, B, n );
 
pairtot += paircount; // op counter
 
 
int ii, jj;
 
int checkcnt = 0;
 
// check result
 
for( ii = 0; ii < n; ii++ )
 
 
for( jj = 0; jj < n && A[ii]+B[jj] <= pivot; jj++ )
 
 
 
checkcnt++;
 
 
std::cout << "n: " << n <<" Pivot:" << pivot << " checkcnt: " << checkcnt << "\n";
 
};
 
std::cout << "avg paircount: " << pairtot/ntrials << " avg/n: "<< pairtot/(n*ntrials*1.0) << "\n";
 
return 0;
}
IP Logged
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board