Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 89736 invoked from network); 8 Nov 2009 20:10:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Nov 2009 20:10:57 -0000 Received: (qmail 10725 invoked by uid 500); 8 Nov 2009 20:10:56 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 10647 invoked by uid 500); 8 Nov 2009 20:10:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 10637 invoked by uid 99); 8 Nov 2009 20:10:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 20:10:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jchris@gmail.com designates 209.85.160.56 as permitted sender) Received: from [209.85.160.56] (HELO mail-pw0-f56.google.com) (209.85.160.56) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 20:10:46 +0000 Received: by pwi5 with SMTP id 5so1719207pwi.35 for ; Sun, 08 Nov 2009 12:10:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=8E+2x2te7YxA4Dso9Y36JalogXGOFEZIqgW9YeZXsrY=; b=PNlJj/1bsex0wQX3kCwPbLRiZslk+D3NTj/cFX+LNkTNAGYDIbgYjFRQSq9Ru6L9/q RD8c7ISJ4QPEqNqmB5NF6qsAyh87UZYY0oIjymBGfN3vCedxD7K4SJO1XkTb/ptvTcEs a7iXC7TIbRx/v4i8Gm3XfmBcSr/KszooSZWZI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=uofw2Z2yKTu5zIq8CxH7pHcL6NYZ5liVk5YAC2EnGip/k4KT7c1D0k+yVYI9jkzTtc 8UsBT/kqPbpRCsy42zd6+7hHK67q+jldRqzRcuJdsTv9WzpJJbHUg004xnrCLyBomkJv HiVzIW9poFq0X78ys6xvq4MVI8ou+xgsqCCzA= MIME-Version: 1.0 Sender: jchris@gmail.com Received: by 10.142.249.7 with SMTP id w7mr699743wfh.317.1257711024434; Sun, 08 Nov 2009 12:10:24 -0800 (PST) In-Reply-To: References: <6b5356e60911071609i23a87a33o256c0891f04815e2@mail.gmail.com> Date: Sun, 8 Nov 2009 12:10:24 -0800 X-Google-Sender-Auth: 64d3653e5e5b78d9 Message-ID: Subject: Re: CouchDB Twitter Clone Architecture From: Chris Anderson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Nov 7, 2009 at 8:07 PM, Paul Davis wrote: > On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh wrote: >> So I've been thinking through the architecture of a Twitter-esque >> system in Couch as a kind of thought exercise to get a better handle >> on some of the more difficult corners of view generation. What would >> be the most effective manner of creating Twitter-like status streams? >> I'd do it like this: Have a global database where all new tweets are posted. We can call this "the firehose" or "the pubic timeline". For each user, have a replication filter function (coming soon in trunk) that only replicates the updates from people they follow, to their own db. Then each user can replicate their db offline or whatever, and have full access to the archive of tweets they've followed. When you follow someone new, changing the filter function won't give you their historical record of old tweets, but you can always use a few to fetch that user's history and save it into the follower's db. I actually prefer not getting a dump of all someone's tweets going back in time, so maybe it's better to make replicating someone's updates to my db an on-demand operation. Also nice with this is that users's who don't visit the site (the people who signup and never come back) won't cost anything, because you don't have to run the filtered replication until a user hits the site. >> My initial feeling is to store the followings of a given user as an >> array in the user's document and also have a view that compiles the >> followers of a given user. When a user posts a status update, the >> application would fetch the follower list from that view and simply >> attach it to the status document. It is then a simply matter of a >> composite key map of a given status document to all of the users >> stored within to create a given user's home timeline. >> >> Where this breaks down is your @aplusk scenario. Storing a 3.5 million >> entry array with a document is obviously going to cripple performance >> (at least I would think it would) as well as take up massive disk >> space (I estimated around 7MB for a single JSON status with 1MM >> followers). >> >> So if this solution isn't scalable to millions of users, what's an >> architecture that would be? How do you compose the user's tweet stream >> such that it can be pulled in an efficient manner? >> >> Just trying to start a discussion to help me better understand >> document-oriented architecture, feel free to ignore me! >> >> Michael Bleigh >> > > Michael, > > Its hard to give too much of a description of what the best would be > like, but off the cuff after more experience than the last time I made > a comment on the "How does tweetcouch work" meme: > > Store each follower relation as a document. Offline when a new tweet > comes in, look at a view that does "emit(person_being_followed, > person_following)" and copy that tweet to the "person_following"'s > stream. > > It may seem odd, but if you watch twitter streams closely you can see > that they're actually a pretty good case of "eventually consistent". > It's really noticeable when you're firing back and forth right quick > between 2 or more people. Twitter is an interesting study because even > if you send a tweet, and then 30 seconds later another tweet shows up > as having arrived before you sent yours, humans don't really care. The > async nature is not sensitive as long as we get a notice within > reasonable time. A failing case is the example of getting a text > message three days later. I just realized I'm still typing, so let me > know if that answered anything. > > HTH, > Paul Davis > -- Chris Anderson http://jchrisa.net http://couch.io