Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 29374 invoked from network); 16 Apr 2010 21:22:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Apr 2010 21:22:43 -0000 Received: (qmail 33552 invoked by uid 500); 16 Apr 2010 21:22:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33528 invoked by uid 500); 16 Apr 2010 21:22:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33515 invoked by uid 99); 16 Apr 2010 21:22:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 21:22:42 +0000 X-ASF-Spam-Status: No, hits=-0.2 required=10.0 tests=AWL,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stuhood@mailtrust.com designates 207.97.245.141 as permitted sender) Received: from [207.97.245.141] (HELO smtp141.iad.emailsrvr.com) (207.97.245.141) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 21:22:38 +0000 Received: from relay4.r5.iad.mlsrvr.com (localhost [127.0.0.1]) by relay4.r5.iad.mlsrvr.com (SMTP Server) with ESMTP id 823F1C302 for ; Fri, 16 Apr 2010 17:22:17 -0400 (EDT) Received: from dynamic10.wm-web.iad.mlsrvr.com (dynamic10.wm-web.iad.mlsrvr.com [192.168.2.217]) by relay4.r5.iad.mlsrvr.com (SMTP Server) with ESMTP id 79CC5C216 for ; Fri, 16 Apr 2010 17:22:17 -0400 (EDT) Received: from mailtrust.com (localhost [127.0.0.1]) by dynamic10.wm-web.iad.mlsrvr.com (Postfix) with ESMTP id 68F2B478807F for ; Fri, 16 Apr 2010 17:22:17 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: stuhood@mailtrust.com, from: stu.hood@rackspace.com) with HTTP; Fri, 16 Apr 2010 16:22:17 -0500 (CDT) Date: Fri, 16 Apr 2010 16:22:17 -0500 (CDT) Subject: Re: Regarding Cassandra Scalability From: "Stu Hood" To: user@cassandra.apache.org MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: <4BC8CCB9.5070204@gmail.com> References: <4BC88D97.7050104@gmail.com> <-6358312770452844291@unknownmsgid> <-8331721885605240603@unknownmsgid> <4BC8CCB9.5070204@gmail.com> Message-ID: <1271452937.428424011@192.168.2.229> X-Mailer: webmail7.0 http://twitter.com/jromeh/status/12295736793=0A=0A-----Original Message----= -=0AFrom: "Mike Gallamore" =0ASent: Friday= , April 16, 2010 3:46pm=0ATo: user@cassandra.apache.org=0ASubject: Re: Rega= rding Cassandra Scalability=0A=0AAlso people with 1M followers tend to have= "public" tweets, which means =0Areally I think it would be the same as sub= scribing to a RSS feed or =0Awhatever. You aren't getting a local copy beca= use you will "always" have =0Aaccess to the tweet as will everyone else. Al= so tweets don't change =0AAFAIK so no point in having redundant copies.=0AO= n 04/16/2010 01:42 PM, Peter Chang wrote:=0A> Yeah. I wasn't sure if Cassan= dra was optimized for binary data=0A> especially since any site of that siz= e will use a CDN. Interesting=0A> read though.=0A>=0A> I think 1K per tweet= is off by an order of magnitude considering they=0A> only allow 140 charac= ters. Regardless the number of users with> 1MM=0A> is probably a handful. = Also im guessing they purge data after a=0A> certain window (like 30 days f= or example).=0A>=0A> Sent from my iPhone=0A>=0A>=0A> On Apr 16, 2010, at 12= :02 PM, gabriele renzi wrote:=0A>=0A> =0A>> On Fri, = Apr 16, 2010 at 6:41 PM, Peter Chang=0A>> wrote:=0A>> = =0A>>> FB also does pics and movies so 1MB is way off depending on where= =0A>>> they=0A>>> manage such binary data.=0A>>> =0A>> apparently no= t in cassandra http://www.facebook.com/note.php?note_id=3D76191543919=0A>>= =0A>> =0A>>> I do agree that 1MB of text alone is a lot of text=0A>>> = which is more relevant in the case of Twitter. The only large thing=0A>>> y= ou=0A>>> leave out is denormalization. Every tweet you write is likely=0A>>= > denormalized=0A>>> across your followers to allow for quick read access.= =0A>>> =0A>> .. but considering many users have _millions_ of follow= ers, this may=0A>> be quite a bit more data. Assuming 1k per tweet, this wo= uld mean one=0A>> from @aplusk (4.7M followers) would take more than 4 giga= bytes of=0A>> data. Assuming ten tweets a day, in one month he'd produce on= e TB.=0A>>=0A>> I'd say they only store references (increasing number lists= can also=0A>> be encoded very cleverly), or in some other way I'm not smar= t enough=0A>> to think of.=0A>> =0A=0A