incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: CouchDB Twitter Clone Architecture
Date Sun, 08 Nov 2009 20:10:24 GMT
On Sat, Nov 7, 2009 at 8:07 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh <michael@intridea.com> wrote:
>> So I've been thinking through the architecture of a Twitter-esque
>> system in Couch as a kind of thought exercise to get a better handle
>> on some of the more difficult corners of view generation. What would
>> be the most effective manner of creating Twitter-like status streams?
>>

I'd do it like this:

Have a global database where all new tweets are posted. We can call
this "the firehose" or "the pubic timeline".

For each user, have a replication filter function (coming soon in
trunk) that only replicates the updates from people they follow, to
their own db. Then each user can replicate their db offline or
whatever, and have full access to the archive of tweets they've
followed.

When you follow someone new, changing the filter function won't give
you their historical record of old tweets, but you can always use a
few to fetch that user's history and save it into the follower's db. I
actually prefer not getting a dump of all someone's tweets going back
in time, so maybe it's better to make replicating someone's updates to
my db an on-demand operation.

Also nice with this is that users's who don't visit the site (the
people who signup and never come back) won't cost anything, because
you don't have to run the filtered replication until a user hits the
site.

>> My initial feeling is to store the followings of a given user as an
>> array in the user's document and also have a view that compiles the
>> followers of a given user. When a user posts a status update, the
>> application would fetch the follower list from that view and simply
>> attach it to the status document. It is then a simply matter of a
>> composite key map of a given status document to all of the users
>> stored within to create a given user's home timeline.
>>
>> Where this breaks down is your @aplusk scenario. Storing a 3.5 million
>> entry array with a document is obviously going to cripple performance
>> (at least I would think it would) as well as take up massive disk
>> space (I estimated around 7MB for a single JSON status with 1MM
>> followers).
>>
>> So if this solution isn't scalable to millions of users, what's an
>> architecture that would be? How do you compose the user's tweet stream
>> such that it can be pulled in an efficient manner?
>>
>> Just trying to start a discussion to help me better understand
>> document-oriented architecture, feel free to ignore me!
>>
>> Michael Bleigh
>>
>
> Michael,
>
> Its hard to give too much of a description of what the best would be
> like, but off the cuff after more experience than the last time I made
> a comment on the "How does tweetcouch work" meme:
>
> Store each follower relation as a document. Offline when a new tweet
> comes in, look at a view that does "emit(person_being_followed,
> person_following)" and copy that tweet to the "person_following"'s
> stream.
>
> It may seem odd, but if you watch twitter streams closely you can see
> that they're actually a pretty good case of "eventually consistent".
> It's really noticeable when you're firing back and forth right quick
> between 2 or more people. Twitter is an interesting study because even
> if you send a tweet, and then 30 seconds later another tweet shows up
> as having arrived before you sent yours, humans don't really care. The
> async nature is not sensitive as long as we get a notice within
> reasonable time. A failing case is the example of getting a text
> message three days later. I just realized I'm still typing, so let me
> know if that answered anything.
>
> HTH,
> Paul Davis
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message