cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <tsalora...@gmail.com>
Subject Re: Regarding Cassandra Scalability
Date Fri, 16 Apr 2010 16:57:38 GMT
On Fri, Apr 16, 2010 at 9:17 AM, Mike Gallamore
<mike.e.gallamore@googlemail.com> wrote:
> On 04/16/2010 01:38 AM, dir dir wrote:
>
> I hear Facebook.com and tweeter.com using cassandra database. In my opinion
> Facebook and
> tweeter have hundreds TB data.  because their user reach hundreds million
> people.
>
> I think you might be forgetting just how tiny tweets are. The last numbers I
> heard tweeter gets 55,000,000 messages a day. They've been around for
> roughly 4 years. Even assuming they always had that number of messages
> (which isn't the case) that still would only be roughly 11TB of data if
> everyone sent the maximum tweet length. Sure add a bit to each message for a
> time stamp and the user that posted it but still I'd be surprised if every
> tweet including meta data was much more than 20TB.

While valid points, I think there are separate issues wrt searching
and indexing, where number of entries is more relevant. I mean,
storing and accessing big BLOBs is not trivial, but in many ways it is
less problematic than craploads of smaller entries. This is probably
more so for FB and LinkedIn, with more graph-oriented challenges. But
it's not just (or even mainly) about raw storage but all the slicing
and dicing that makes things challenging.

-+ Tatu +-

Mime
View raw message