incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Clark <co...@cloudeventprocessing.com>
Subject Re: TechCrunch article on Twitter and Cassandra
Date Sat, 10 Jul 2010 19:22:19 GMT
I'm not aware of anyone classifying what twitter is doing today as 
'working.'  In fact, I believe that twitter's problems are much larger 
than just technology but that's a whole different subject.

What twitter may have realized is that they don't have the resources of 
Facebook, that Facebook's use case is fairly limited (although a large 
deployment), and that they may have been trudging off into the great 
unknown.

Although I'm a fan of Cassandra, there's no way I'd use it today for my 
tier 1 deployments, because I don't have the resources of Facebook, and 
even though Cassandra is open source, that doesn't mean I can fix it 
when it goes down.  And, because it's open source, there's no one to 
call to have it fixed reliably and within production constraints.  
Cassandra's strength is its greatest weakness right now.

The bloom is starting to come off NoSQL, which is normal - it means that 
people & firms are trying to do more with it and most probably realizing 
that all of the tools, support, infrastructure, etc. surrounding 
alternative solutions isn't such a bad thing.  And that the world of 
NoSQL had start to come up with a better mantra than "joins are bad, 
dude", and "you're just protecting the status quo."  There's a *lot 
more* big data wrapped up inside of SQL databases and only a fraction of 
the in NoSQL - and there's a lot of reasons for it.

For example, do I *really* need Cassandra if MySQL will work for me and 
I just want to get up and running quickly without writing a bunch of 
code?  My team was pushing greater than 20k updates per second into, 
GASP, Oracle 5 years ago.  Sure, it was expensive.  But it worked.  And 
it was worth it - or we wouldn't have spent the $$.  What's your data 
worth if you don't have your data? zero.

And then there's support - internal support.  Picking a database du-jour 
is organizationally expensive.  Especially when there's probably one or 
two databases that Twitter could have bought off the shelf that would 
have solved their problems.  But instead of bolstering the reliability 
and robustness of their internal architecture, they've gone and used 
very expensive equity for acquisitions.   Running multiple databases in 
a fault tolerant, geographically disperse deployment isn't easy (yes, 
I've done it) and having multiple databases in the mix really 
complicates things.  And at this stage in Twitter's growth, I frankly 
don't understand why they're looking to complicate their technological 
landscape any more than absolutely required.

So, this entire rant can be summarized really quite succinctly:

"If data is your business (like Facebook & Twitter), if you don't have 
the resources to cost effectively handle all of your data management 
needs internally (Facebook does, Twitter doesn't), then basing your 
solution on un-proven storage solutions (commercial or open source, SQL 
or NoSQL) is a risky and short sighted strategy."

Please send death threats via the channels iterated below:


Colin
+1 315 886 3422 cell
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>

On 7/10/2010 2:02 PM, Ryan King wrote:
> On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia<martygreenia@gmail.com>  wrote:
>    
>> It almost seems counter-intuitive. For analytics, you'd think they'd want a
>> database that supports more sophisticated query functionality (sql). Whereas
>> for everyday tweet storage, something fast and high-throughput (cassandra)
>> makes sense.
>>
>> I'd be curious to here the details as well.
>>      
> These decisions aren't made in a vacuum. One of these use cases has an
> existing system that works, one doesn't.
>
> -ryan
>    

Mime
View raw message