cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Clark <>
Subject Re: TechCrunch article on Twitter and Cassandra
Date Sat, 10 Jul 2010 23:40:00 GMT

Please see below - it sounds like you're taking this a little personally 
and I'm not sure why.  You've made some errors in your reply.

+1 315 886 3422 cella
+1 701 212 4314 office <>

On 7/10/2010 5:21 PM, Benjamin Black wrote:
> On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark
> <>  wrote:
>> Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
>> 1 deployments, because I don't have the resources of Facebook, and even
>> though Cassandra is open source, that doesn't mean I can fix it when it goes
>> down.  And, because it's open source, there's no one to call to have it
>> fixed reliably and within production constraints.  Cassandra's strength is
>> its greatest weakness right now.
> There are others, however, who do have the skills not just to fix it
> when it goes down, but to improve the code in a variety of ways and
> contribute that code back the the project.  That you do not have those
> skills is a good indication you should stick to what you know, not an
> indictment of Cassandra (or any other non-SQL store).
I didn't say 'didn't have the skills.'  I said 'resources.'  Those are 
two very different things.  While I and my team have nothing to prove to 
you, working on Cassandra is completely within our realm of ability and 
expertise.  Not having the resources means, that relative to our current 
focus, we, our customers, and our investors get a bigger bang of each 
engineering $ spent having us focus on different problems.  Using a 
piece of software isn't just an engineering issue, it has to make 
business sense as well.  So if I really wanted to use Cassandra in a 
mission critical way, I'd have to be able to justify the investment 
involved in creating an internal Cassandra team.  This is why there's so 
much 'flap' over what Twitter and Facebook are or are not using 
Cassandra for.
>> The bloom is starting to come off NoSQL, which is normal - it means that
>> people&  firms are trying to do more with it and most probably realizing
>> that all of the tools, support, infrastructure, etc. surrounding alternative
>> solutions isn't such a bad thing.  And that the world of NoSQL had start to
>> come up with a better mantra than "joins are bad, dude", and "you're just
>> protecting the status quo."  There's a *lot more* big data wrapped up inside
>> of SQL databases and only a fraction of the in NoSQL - and there's a lot of
>> reasons for it.
> You are, for whatever reason, using the dullest of cliches as if they
> were informed opinion.  Nobody with actual knowledge of the space says
> "joins are bad, dude".  What they might say is "When you have
> petabytes and low latency requirements, joins are an expensive
> proposition".  That is clearly a true statement and constructing
> indices in a column store to avoid joins is a reasonable decision to
> avoid that expense.  Is it free?  Of course not, nothing is.
Again, I'm a fan of NoSql, and of Cassandra.  When I said, 'the world of 
NoSQL,' I was including myself in that world.  And, I agree that those 
cliches are dull, overused, and ill-informed (anyone who's actually done 
anything with a lot of data knows how expensive joins are - with or 
without petabytes).  But again, this is what business sees when they 
listen to Twitter, or subscribe to these mailing lists.  This is how 
opinions are formed in the minds of analysts and they then influence 
their customers.  We need to do a better job, and yet again, this is why 
understanding what Twitter and Facebook are or are not doing with 
Cassandra is important.
>> For example, do I *really* need Cassandra if MySQL will work for me and I
>> just want to get up and running quickly without writing a bunch of code?  My
>> team was pushing greater than 20k updates per second into, GASP, Oracle 5
>> years ago.  Sure, it was expensive.  But it worked.  And it was worth it -
>> or we wouldn't have spent the $$.  What's your data worth if you don't have
>> your data? zero.
> Had you spent any time on the irc channel you would've seen this
> advice given repeatedly.  If you don't need what Cassandra does, don't
> use it.  That you have seen 20k updates/sec on really expensive
> hardware with a SQL store is neither surprising nor relevant.  As you
> must realize, those choose to ignore, Cassandra is about more than
> just high, per-node write throughput.  It is about seamless scale-out
> of a single cluster, robustness in the face of node failure and
> network partition, etc.  Can you do that with a SQL store?  Certainly.
>   Expect to pay 5x in hardware and not be able to operate multi-DC.
> It's what folks call a trade-off.

So that's a trade-off?  Thanks - maybe Facebook and Twitter missed that 
before spending hundreds of thousands of $$ on a project only to later 
change course.  Include opportunity cost in that, and you're easily in 
the millions of wasted $- or do we call that a 'learning exercise?'  I'd 
love to hear what Twitter & Facebook's boards (there I am again with 
that whole pesky 'business' thing again) had to say about that?  And I'm 
assuming that the same thing might just happen to a tech team that chose 
to spend valuable cycles on evaluating/implementing Cassandra only to 
change course - they'd have to explain that as well.  And then they'd 
hear something like, "Dudes, you did what?  Even Facebook & Twitter 
decided not to use Cassandra that way!"  This is not as far fetched as 
it sounds.  Someone on my advisory board asked me a very similar 
question about our use of Cassandra and given the recent news, whether 
or not that impacted our plans.

And I'm assuming that if you're going to frantically wave arms with "SQL 
costs 5x more and you can't do that multi-DC..." that you've got 
something to back that up?  'Cuz Facebook is using a SQL store, they're 
using it multi-DC, and they're running on commodity hardware, right?

>> And then there's support - internal support.  Picking a database du-jour is
>> organizationally expensive.  Especially when there's probably one or two
>> databases that Twitter could have bought off the shelf that would have
>> solved their problems.
> You have no idea what their actual problems are and are merely
> engaging in the favorite game of HN and similar venues: armchair
> engineering.
Sure I do.  But from a business perspective.  Their architecture doesn't 
scale right now very well.  They're running with reduced API limits and 
you still get the 'fail whale' more than occasionally.  People lose 
followers.  People lose tweets.  Privacy has been compromised.  Need I 
go on?  All of this would make me, as a potential customer of Twitter, 
as a question, "So, what's up with the scalability thing?  What happens 
if I miss a critical time window with my sponsored Tweets?  Do I get 
that $ back?, I didn't get 'imprints' but the opportunity is gone."  But 
you're right, from an engineering point of view, I have no idea what 
their problems are.  I do know that Cassandra was supposed to fix some 
of them, and now it's not and I don't know anything about that from an 
engineering point of view either.

Also, I have no idea of what 'HN or similar venues' refers to.

> b

View raw message