incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Chang <pete...@gmail.com>
Subject Re: Regarding Cassandra Scalability
Date Fri, 16 Apr 2010 20:42:18 GMT
Yeah. I wasn't sure if Cassandra was optimized for binary data
especially since any site of that size will use a CDN. Interesting
read though.

I think 1K per tweet is off by an order of magnitude considering they
only allow 140 characters. Regardless the number of users with > 1MM
is probably a handful. Also im guessing they purge data after a
certain window (like 30 days for example).

Sent from my iPhone


On Apr 16, 2010, at 12:02 PM, gabriele renzi <rff.rff@gmail.com> wrote:

> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang <peter78@gmail.com>
> wrote:
>> FB also does pics and movies so 1MB is way off depending on where
>> they
>> manage such binary data.
>
> apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919
>
>> I do agree that 1MB of text alone is a lot of text
>> which is more relevant in the case of Twitter. The only large thing
>> you
>> leave out is denormalization. Every tweet you write is likely
>> denormalized
>> across your followers to allow for quick read access.
>
> .. but considering many users have _millions_ of followers, this may
> be quite a bit more data. Assuming 1k per tweet, this would mean one
> from @aplusk (4.7M followers) would take more than 4 gigabytes of
> data. Assuming ten tweets a day, in one month he'd produce one TB.
>
> I'd say they only store references (increasing number lists can also
> be encoded very cleverly), or in some other way I'm not smart enough
> to think of.

Mime
View raw message