cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject Re: Regarding Cassandra Scalability
Date Fri, 16 Apr 2010 21:22:17 GMT
http://twitter.com/jromeh/status/12295736793

-----Original Message-----
From: "Mike Gallamore" <mike.e.gallamore@googlemail.com>
Sent: Friday, April 16, 2010 3:46pm
To: user@cassandra.apache.org
Subject: Re: Regarding Cassandra Scalability

Also people with 1M followers tend to have "public" tweets, which means 
really I think it would be the same as subscribing to a RSS feed or 
whatever. You aren't getting a local copy because you will "always" have 
access to the tweet as will everyone else. Also tweets don't change 
AFAIK so no point in having redundant copies.
On 04/16/2010 01:42 PM, Peter Chang wrote:
> Yeah. I wasn't sure if Cassandra was optimized for binary data
> especially since any site of that size will use a CDN. Interesting
> read though.
>
> I think 1K per tweet is off by an order of magnitude considering they
> only allow 140 characters. Regardless the number of users with>  1MM
> is probably a handful. Also im guessing they purge data after a
> certain window (like 30 days for example).
>
> Sent from my iPhone
>
>
> On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff.rff@gmail.com>  wrote:
>
>    
>> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<peter78@gmail.com>
>> wrote:
>>      
>>> FB also does pics and movies so 1MB is way off depending on where
>>> they
>>> manage such binary data.
>>>        
>> apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919
>>
>>      
>>> I do agree that 1MB of text alone is a lot of text
>>> which is more relevant in the case of Twitter. The only large thing
>>> you
>>> leave out is denormalization. Every tweet you write is likely
>>> denormalized
>>> across your followers to allow for quick read access.
>>>        
>> .. but considering many users have _millions_ of followers, this may
>> be quite a bit more data. Assuming 1k per tweet, this would mean one
>> from @aplusk (4.7M followers) would take more than 4 gigabytes of
>> data. Assuming ten tweets a day, in one month he'd produce one TB.
>>
>> I'd say they only store references (increasing number lists can also
>> be encoded very cleverly), or in some other way I'm not smart enough
>> to think of.
>>      




Mime
View raw message