Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 46898 invoked from network); 8 Nov 2009 04:07:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Nov 2009 04:07:51 -0000 Received: (qmail 73657 invoked by uid 500); 8 Nov 2009 04:07:50 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 73511 invoked by uid 500); 8 Nov 2009 04:07:49 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 73501 invoked by uid 99); 8 Nov 2009 04:07:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 04:07:49 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-yx0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 04:07:47 +0000 Received: by yxe6 with SMTP id 6so2079970yxe.13 for ; Sat, 07 Nov 2009 20:07:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=PJ1MF+pJsN+ykneGDADi6Id5nR4oRLMyZj1hYUNmULY=; b=hqKNOgbuoXgN+GIq3KY26UKtyWBht4rJ9Eqt8Td3pY1tUVVnzdAZ0IqKCHYg83Dp+f 7LN+jUk2QGhj0rdFZI5wJPANqnk2vY5PkIU7hjIezI+PoGW5rELKhkAymK7Fgo+fejUN 2aCst/u6W6qhfnhsTsJ6/s2Exk5Y7zq+lhr/k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=lz+g9ii02gh2sLU23sS49HPDp+b1nHhWy/VNFsQ6BHem9nXwtDqw8SkB8oSBL/1Wrc 6mKUE1g6ExonH1OWbh57oKKrQd8atZs47H45Xgfam0KXAUJQD4va7Nly+QQ9+uN42ZLX F1G3hDEi2ToTOtB1DidXHjAGuFoelTa2D6Kzw= MIME-Version: 1.0 Received: by 10.101.6.13 with SMTP id j13mr5705401ani.80.1257653245054; Sat, 07 Nov 2009 20:07:25 -0800 (PST) In-Reply-To: <6b5356e60911071609i23a87a33o256c0891f04815e2@mail.gmail.com> References: <6b5356e60911071609i23a87a33o256c0891f04815e2@mail.gmail.com> From: Paul Davis Date: Sat, 7 Nov 2009 23:07:05 -0500 Message-ID: Subject: Re: CouchDB Twitter Clone Architecture To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh wrote: > So I've been thinking through the architecture of a Twitter-esque > system in Couch as a kind of thought exercise to get a better handle > on some of the more difficult corners of view generation. What would > be the most effective manner of creating Twitter-like status streams? > > My initial feeling is to store the followings of a given user as an > array in the user's document and also have a view that compiles the > followers of a given user. When a user posts a status update, the > application would fetch the follower list from that view and simply > attach it to the status document. It is then a simply matter of a > composite key map of a given status document to all of the users > stored within to create a given user's home timeline. > > Where this breaks down is your @aplusk scenario. Storing a 3.5 million > entry array with a document is obviously going to cripple performance > (at least I would think it would) as well as take up massive disk > space (I estimated around 7MB for a single JSON status with 1MM > followers). > > So if this solution isn't scalable to millions of users, what's an > architecture that would be? How do you compose the user's tweet stream > such that it can be pulled in an efficient manner? > > Just trying to start a discussion to help me better understand > document-oriented architecture, feel free to ignore me! > > Michael Bleigh > Michael, Its hard to give too much of a description of what the best would be like, but off the cuff after more experience than the last time I made a comment on the "How does tweetcouch work" meme: Store each follower relation as a document. Offline when a new tweet comes in, look at a view that does "emit(person_being_followed, person_following)" and copy that tweet to the "person_following"'s stream. It may seem odd, but if you watch twitter streams closely you can see that they're actually a pretty good case of "eventually consistent". It's really noticeable when you're firing back and forth right quick between 2 or more people. Twitter is an interesting study because even if you send a tweet, and then 30 seconds later another tweet shows up as having arrived before you sent yours, humans don't really care. The async nature is not sensitive as long as we get a notice within reasonable time. A failing case is the example of getting a text message three days later. I just realized I'm still typing, so let me know if that answered anything. HTH, Paul Davis