Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 22406 invoked from network); 16 Apr 2010 16:41:55 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Apr 2010 16:41:55 -0000 Received: (qmail 6548 invoked by uid 500); 16 Apr 2010 16:41:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6525 invoked by uid 500); 16 Apr 2010 16:41:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6517 invoked by uid 99); 16 Apr 2010 16:41:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 16:41:54 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of peter78@gmail.com designates 209.85.222.181 as permitted sender) Received: from [209.85.222.181] (HELO mail-pz0-f181.google.com) (209.85.222.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 16:41:46 +0000 Received: by pzk11 with SMTP id 11so2285459pzk.28 for ; Fri, 16 Apr 2010 09:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:in-reply-to:mime-version:references:date :received:message-id:subject:to:content-type; bh=Y0rQyXJg1QVqqb8hpubCgOqC7TTUwuaHD4cnffwSZwQ=; b=cToq9COoLbY9X+kKeSXSI3lWbgCF08Gox+SJgLi1JF4v2vpGZPZHx8mSpSiUydud4V cuAFK2+eLws1EVDZgD0dG2uc7ZJFLpTljwJEzRp8DpmraqCxRRwYce1AxKVQ9hzgT+5P rdLouC7WZlQPzckJFKA49z6M+YOFT9RJyw5d0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:in-reply-to:mime-version:references:date:message-id:subject:to :content-type; b=mg5BHuMC0X21xNEI60uAqK4MhfelzQwk4YQn5aYsRZkvnqCPdihLd1zSRG7VLXA80f rfrCeByKSkJA4tqlVeq0SEaqXJRYBE6X6I4fcGXkWPw0dtoJhvU6zPGruNLgBJ4o1SKh 6SUeC6UlXOmKn8Jy34yG9gcV7EbYGpNnzeOhs= From: Peter Chang In-Reply-To: <4BC88D97.7050104@gmail.com> Mime-Version: 1.0 (iPhone Mail 7E18) References: <4BC88D97.7050104@gmail.com> Date: Fri, 16 Apr 2010 10:41:16 -0600 Received: by 10.142.3.19 with SMTP id 19mr1042584wfc.200.1271436084653; Fri, 16 Apr 2010 09:41:24 -0700 (PDT) Message-ID: <-6358312770452844291@unknownmsgid> Subject: Re: Regarding Cassandra Scalability To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=00504502ad2c2e2d6e04845d482f X-Virus-Checked: Checked by ClamAV on apache.org --00504502ad2c2e2d6e04845d482f Content-Type: text/plain; charset=ISO-8859-1 FB also does pics and movies so 1MB is way off depending on where they manage such binary data. I do agree that 1MB of text alone is a lot of text which is more relevant in the case of Twitter. The only large thing you leave out is denormalization. Every tweet you write is likely denormalized across your followers to allow for quick read access. Sent from my iPhone On Apr 16, 2010, at 10:17 AM, Mike Gallamore < mike.e.gallamore@googlemail.com> wrote: On 04/16/2010 01:38 AM, dir dir wrote: I hear Facebook.com and tweeter.com using cassandra database. In my opinion Facebook and tweeter have hundreds TB data. because their user reach hundreds million people. I think you might be forgetting just how tiny tweets are. The last numbers I heard tweeter gets 55,000,000 messages a day. They've been around for roughly 4 years. Even assuming they always had that number of messages (which isn't the case) that still would only be roughly 11TB of data if everyone sent the maximum tweet length. Sure add a bit to each message for a time stamp and the user that posted it but still I'd be surprised if every tweet including meta data was much more than 20TB. Similarly with Facebook. I think it is the friend list search that they really did it with. Regardless how much text is on your Facebook page? Maybe 1MB if you are a very very active user. The images I wouldn't think they would load directly into Cassandra but I could be wrong, I would suspect that they would pull an old database trick and have filesystem store the images and the "database" just stores the path to it. There could be a lot of other data floating around some of which might be in Cassandra but I don't know. Just the core data that the sites have mentioned that they use Cassandra for I think is probably in the very low 10's of TB. Lastly sites like Facebook and Tweeter count hundreds of millions of users but a lot of them are people that sign in, send a few tweets or connect to a few friends and then don't use the site again. When the company needs to make themselves look valuable they count every single person that ever logged in, even if they only did it once or haven't used the site for years. They want to sell large numbers because that is what advertisers/potential acquirers to base the price on those large numbers. Dir. On Fri, Apr 16, 2010 at 1:28 PM, Linton N wrote: > hi , > I am working for the past 1 year with hadoop, but quite new to > cassandra, I would like to get clarified few things regarding the > scalability of Cassandra. Can it scall up to TB of data ? > > Please provide me some links regarding this.. > > > -- > -- > With Love > Lin N > --00504502ad2c2e2d6e04845d482f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
FB also does pics and movies so 1MB is= way off depending on where they manage such binary data. I do agree that 1= MB of text alone is a lot of text which is more relevant in the case of Twi= tter. The only large thing you leave out is denormalization. Every tweet yo= u write is likely denormalized across your followers to allow for quick rea= d access.=A0

Sent from my iPhone


On Apr 16, 2010, at 10:17 AM, Mike Gallamore <mike.e.gallamore@googlemail.com&g= t; wrote:

On 04/16/2010 01:38 AM, dir dir wrote:
I hear Facebook.= com and t= weeter.com using cassandra database. In my opinion Facebook and
tweeter have hundreds TB data.=A0 because their user reach hundreds million people.
I think you might be forgetting just how tiny tweets are. The last numbers I heard tweeter gets 55,000,000 messages a day. They've been around for roughly 4 years. Even assuming they always had that number of messages (which isn't the case) that still would only be roughly 11TB of data if everyone sent the maximum tweet length. Sure add a bit to each message for a time stamp and the user that posted it but still I'd be surprised if every tweet including meta data was much more than 20TB.

Similarly with Facebook. I think it is the friend list search that they really did it with. Regardless how much text is on your Facebook page? Maybe 1MB if you are a very very active user. The images I wouldn't think they would load directly into Cassandra but I could be wrong, I would suspect that they would pull an old database trick and have filesystem store the images and the "database" just stores the pa= th to it.

There could be a lot of other data floating around some of which might be in Cassandra but I don't know. Just the core data that the sites have mentioned that they use Cassandra for I think is probably in the very low 10's of TB.

Lastly sites like Facebook and Tweeter count hundreds of millions of users but a lot of them are people that sign in, send a few tweets or connect to a few friends and then don't use the site again. When the company needs to make themselves look valuable they count every single person that ever logged in, even if they only did it once or haven't used the site for years. They want to sell large numbers because that is what advertisers/potential acquirers to base the price on those large numbers.

Dir.


On Fri, Apr 16, 2010 at 1:28 PM, Linton N <gabrialmarialinton@gmail.com> wrote:
hi ,
=A0=A0=A0=A0=A0=A0=A0=A0 I am working for the past 1 year with hadoop, but = quite new to cassandra, I would like to get clarified few things regarding the scalability of Cassandra. Can it scall up to TB of data ?

Please provide me some links regarding this..


--
--
With Love
=A0Lin N


--00504502ad2c2e2d6e04845d482f--