incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Gallamore <mike.e.gallam...@googlemail.com>
Subject Re: Regarding Cassandra Scalability
Date Fri, 16 Apr 2010 21:39:08 GMT
Does that include HD copies of CNN et al reading tweets to people on 
T.V.? You know your medium is doomed when you're reduced to reading 
comments from random_dude64 and omg69 because they get the news out 
faster than you can.

They must be tracking a lot more than just the tweets themselves (which 
is expected if they want to monetize the service) as even they say 50M a 
day: http://blog.twitter.com/2010/02/measuring-tweets.html.

Not saying your wrong, just they must be doing a lot else with it. 
Perhaps their logs of delivery of the tweets are just the tweets 
themselves: they are small enough perhaps generating a hash of the 
message and saving the hash into a log makes less sense than just going 
ahead and saving a copy for each person that is subscribed.Either way 
crazy data. Lots of people with more data, but I doubt as much that was 
typed by hand :-)
On 04/16/2010 02:22 PM, Stu Hood wrote:
> http://twitter.com/jromeh/status/12295736793
>
> -----Original Message-----
> From: "Mike Gallamore"<mike.e.gallamore@googlemail.com>
> Sent: Friday, April 16, 2010 3:46pm
> To: user@cassandra.apache.org
> Subject: Re: Regarding Cassandra Scalability
>
> Also people with 1M followers tend to have "public" tweets, which means
> really I think it would be the same as subscribing to a RSS feed or
> whatever. You aren't getting a local copy because you will "always" have
> access to the tweet as will everyone else. Also tweets don't change
> AFAIK so no point in having redundant copies.
> On 04/16/2010 01:42 PM, Peter Chang wrote:
>    
>> Yeah. I wasn't sure if Cassandra was optimized for binary data
>> especially since any site of that size will use a CDN. Interesting
>> read though.
>>
>> I think 1K per tweet is off by an order of magnitude considering they
>> only allow 140 characters. Regardless the number of users with>   1MM
>> is probably a handful. Also im guessing they purge data after a
>> certain window (like 30 days for example).
>>
>> Sent from my iPhone
>>
>>
>> On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff.rff@gmail.com>   wrote:
>>
>>
>>      
>>> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<peter78@gmail.com>
>>> wrote:
>>>
>>>        
>>>> FB also does pics and movies so 1MB is way off depending on where
>>>> they
>>>> manage such binary data.
>>>>
>>>>          
>>> apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919
>>>
>>>
>>>        
>>>> I do agree that 1MB of text alone is a lot of text
>>>> which is more relevant in the case of Twitter. The only large thing
>>>> you
>>>> leave out is denormalization. Every tweet you write is likely
>>>> denormalized
>>>> across your followers to allow for quick read access.
>>>>
>>>>          
>>> .. but considering many users have _millions_ of followers, this may
>>> be quite a bit more data. Assuming 1k per tweet, this would mean one
>>> from @aplusk (4.7M followers) would take more than 4 gigabytes of
>>> data. Assuming ten tweets a day, in one month he'd produce one TB.
>>>
>>> I'd say they only store references (increasing number lists can also
>>> be encoded very cleverly), or in some other way I'm not smart enough
>>> to think of.
>>>
>>>        
>
>
>    


Mime
View raw message