Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 11214 invoked from network); 16 Apr 2010 20:43:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Apr 2010 20:43:06 -0000 Received: (qmail 90770 invoked by uid 500); 16 Apr 2010 20:43:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90754 invoked by uid 500); 16 Apr 2010 20:43:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90746 invoked by uid 99); 16 Apr 2010 20:43:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 20:43:04 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=AWL,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of peter78@gmail.com designates 209.85.222.181 as permitted sender) Received: from [209.85.222.181] (HELO mail-pz0-f181.google.com) (209.85.222.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 20:42:59 +0000 Received: by pzk11 with SMTP id 11so2518544pzk.28 for ; Fri, 16 Apr 2010 13:42:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:references:from:in-reply-to:mime-version:date :received:message-id:subject:to:content-type; bh=dwdU1wm2esj2Ycr35xPlQBLqWZfAJMB36RArV6aLdio=; b=mHfzZqPc4s/g4zbmd/4MyHuYDfOb3ZA6wJ0D+/7yw/hpCaUa18jWBiQ8yi/m/Lfgm7 qx7EBAC3TETpTkuL7ffvFmdhflEST1pxKiShYC3KAuLuRTXdoeqCtFUPBGAT2+gPTark 90FPp6QiE/i8gFPqBG7GjZqo3r+JRkBM+LNG8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=references:from:in-reply-to:mime-version:date:message-id:subject:to :content-type; b=sz6DWEHmGPnW5JzkohSjDucUexrZKZ2m1Eu8qYuRj/ISTq3l7XuQisNKm333ghiSEv NgDX7VnAJty8vF4u23UozQ4/4FAtecaNMQs1XCLlO4fwy/K/Hfc519mQ7Un2ZVa12V+Z aU+XJJkpkEh9dJLMbC8LPCZbwEVbBuVs9SPF4= References: <4BC88D97.7050104@gmail.com> <-6358312770452844291@unknownmsgid> From: Peter Chang In-Reply-To: Mime-Version: 1.0 (iPhone Mail 7E18) Date: Fri, 16 Apr 2010 14:42:18 -0600 Received: by 10.143.153.21 with SMTP id f21mr1137747wfo.91.1271450559570; Fri, 16 Apr 2010 13:42:39 -0700 (PDT) Message-ID: <-8331721885605240603@unknownmsgid> Subject: Re: Regarding Cassandra Scalability To: "user@cassandra.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Yeah. I wasn't sure if Cassandra was optimized for binary data especially since any site of that size will use a CDN. Interesting read though. I think 1K per tweet is off by an order of magnitude considering they only allow 140 characters. Regardless the number of users with > 1MM is probably a handful. Also im guessing they purge data after a certain window (like 30 days for example). Sent from my iPhone On Apr 16, 2010, at 12:02 PM, gabriele renzi wrote: > On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang > wrote: >> FB also does pics and movies so 1MB is way off depending on where >> they >> manage such binary data. > > apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919 > >> I do agree that 1MB of text alone is a lot of text >> which is more relevant in the case of Twitter. The only large thing >> you >> leave out is denormalization. Every tweet you write is likely >> denormalized >> across your followers to allow for quick read access. > > .. but considering many users have _millions_ of followers, this may > be quite a bit more data. Assuming 1k per tweet, this would mean one > from @aplusk (4.7M followers) would take more than 4 gigabytes of > data. Assuming ten tweets a day, in one month he'd produce one TB. > > I'd say they only store references (increasing number lists can also > be encoded very cleverly), or in some other way I'm not smart enough > to think of.