wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> cs >> design the twitter architecture
        
(Message started by: spur on May 23^rd, 2012, 12:49pm)

Title: design the twitter architecture
Post by spur on May 23^rd, 2012, 12:49pm

When an interviewer asks you to design the twitter what does he expect to get out of the candidate?
The question was how would you show the latest 5 tweets of people whom he follows? Take care of the scalability of the problem.

Title: Re: design the twitter architecture
Post by spur on Jun 8^th, 2012, 12:44pm

Hey, did i put the question in the wrong thread?

Ok let me explain what solution i gave for the question
The aim is to show latest 5 tweets
i would use two logic one for people whom i am subscribed who have large number of subscribers(famous tweeters) and other for others who are like friends who do not have many subscribers(non-famous).
For non-famous use listener observer algorithm and update each subscribers stack.
For each famous user have a stack for his tweets.
While rendering the latest tweets use the user's stack and all stacks of the famous people and get the latest 5 tweets from all of them.

Title: Re: design the twitter architecture
Post by towr on Jun 9^th, 2012, 4:53am

on 06/08/12 at 12:44:39, spur wrote:

Hey, did i put the question in the wrong thread?

No, it's in the right place. But I guess few people have a good idea to share. I don't really know enough about how to optimize performance in such large scale systems.
Separating the popular tweeters into a separate group sounds like good idea, since those tweets will be requested often, but that doesn't go far to answering the question I think. Because how do you efficiently find the 5 most recent posts among dozens of tweeters whose data may be spread across dozens of servers. It would seem a waste to get the last 5 tweets of each of them, sort those and return the 5 latest overall.
Both time-efficient and bandwidth is a concern here; as well as how to organize the data in the first place.

And you can't really give a really good answer without some idea of the statistics of the problem; what's the distribution of the number of people someone follows, how many request do they make, etc.