cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <zson...@gmail.com>
Subject Re: TechCrunch article on Twitter and Cassandra
Date Sun, 11 Jul 2010 03:31:03 GMT
t is ardently discussing @http://news.ycombinator.com/item?id=1502756
Here are my comments:
1. Cassandra is very young! Especially, the design and implementation of
local storage and local indexing are junior and not good.
2. Pool read-performance is also due to the poor local storage
implementation.
3. The local storage, indexing and persistence structures are not stable.
They need to be re-designed /re-implemented. If Twitter move data to current
Cassandra, they should do another move later for a new local storage,
indexing and persistence structure.

4. Twitter have very good experiences on MySQL, but not for Cassandra. Build
and maintain and product such as Cassandra need more smart and practised
engineers.
5. There are many good techniques in Cassandra and other open-sourced
projects (such as Hadoop, HBase ...), etc. But, they are not ready for
production. Understand the detail of these techniques and implement them in
your projects/products.


On Sun, Jul 11, 2010 at 7:40 AM, Colin Clark <colin@cloudeventprocessing.com
> wrote:

>  Benjamin,
>
> Please see below - it sounds like you're taking this a little personally
> and I'm not sure why.  You've made some errors in your reply.
>
>  Colin
> +1 315 886 3422 cella
> +1 701 212 4314 office
> http://blog.cloudeventprocessing.com
> http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>
>
> On 7/10/2010 5:21 PM, Benjamin Black wrote:
>
> On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark<colin@cloudeventprocessing.com> <colin@cloudeventprocessing.com>
wrote:
>
>
>  Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
> 1 deployments, because I don't have the resources of Facebook, and even
> though Cassandra is open source, that doesn't mean I can fix it when it goes
> down.  And, because it's open source, there's no one to call to have it
> fixed reliably and within production constraints.  Cassandra's strength is
> its greatest weakness right now.
>
>
>
>  There are others, however, who do have the skills not just to fix it
> when it goes down, but to improve the code in a variety of ways and
> contribute that code back the the project.  That you do not have those
> skills is a good indication you should stick to what you know, not an
> indictment of Cassandra (or any other non-SQL store).
>
>
>
>  I didn't say 'didn't have the skills.'  I said 'resources.'  Those are two
> very different things.  While I and my team have nothing to prove to you,
> working on Cassandra is completely within our realm of ability and
> expertise.  Not having the resources means, that relative to our current
> focus, we, our customers, and our investors get a bigger bang of each
> engineering $ spent having us focus on different problems.  Using a piece of
> software isn't just an engineering issue, it has to make business sense as
> well.  So if I really wanted to use Cassandra in a mission critical way, I'd
> have to be able to justify the investment involved in creating an internal
> Cassandra team.  This is why there's so much 'flap' over what Twitter and
> Facebook are or are not using Cassandra for.
>
>  The bloom is starting to come off NoSQL, which is normal - it means that
> people & firms are trying to do more with it and most probably realizing
> that all of the tools, support, infrastructure, etc. surrounding alternative
> solutions isn't such a bad thing.  And that the world of NoSQL had start to
> come up with a better mantra than "joins are bad, dude", and "you're just
> protecting the status quo."  There's a *lot more* big data wrapped up inside
> of SQL databases and only a fraction of the in NoSQL - and there's a lot of
> reasons for it.
>
>
>
>  You are, for whatever reason, using the dullest of cliches as if they
> were informed opinion.  Nobody with actual knowledge of the space says
> "joins are bad, dude".  What they might say is "When you have
> petabytes and low latency requirements, joins are an expensive
> proposition".  That is clearly a true statement and constructing
> indices in a column store to avoid joins is a reasonable decision to
> avoid that expense.  Is it free?  Of course not, nothing is.
>
>
>
>  Again, I'm a fan of NoSql, and of Cassandra.  When I said, 'the world of
> NoSQL,' I was including myself in that world.  And, I agree that those
> cliches are dull, overused, and ill-informed (anyone who's actually done
> anything with a lot of data knows how expensive joins are - with or without
> petabytes).  But again, this is what business sees when they listen to
> Twitter, or subscribe to these mailing lists.  This is how opinions are
> formed in the minds of analysts and they then influence their customers.  We
> need to do a better job, and yet again, this is why understanding what
> Twitter and Facebook are or are not doing with Cassandra is important.
>
>   For example, do I *really* need Cassandra if MySQL will work for me and I
> just want to get up and running quickly without writing a bunch of code?  My
> team was pushing greater than 20k updates per second into, GASP, Oracle 5
> years ago.  Sure, it was expensive.  But it worked.  And it was worth it -
> or we wouldn't have spent the $$.  What's your data worth if you don't have
> your data? zero.
>
>
>
>  Had you spent any time on the irc channel you would've seen this
> advice given repeatedly.  If you don't need what Cassandra does, don't
> use it.  That you have seen 20k updates/sec on really expensive
> hardware with a SQL store is neither surprising nor relevant.  As you
> must realize, those choose to ignore, Cassandra is about more than
> just high, per-node write throughput.  It is about seamless scale-out
> of a single cluster, robustness in the face of node failure and
> network partition, etc.  Can you do that with a SQL store?  Certainly.
>  Expect to pay 5x in hardware and not be able to operate multi-DC.
> It's what folks call a trade-off.
>
>
>
> So that's a trade-off?  Thanks - maybe Facebook and Twitter missed that
> before spending hundreds of thousands of $$ on a project only to later
> change course.  Include opportunity cost in that, and you're easily in the
> millions of wasted $- or do we call that a 'learning exercise?'  I'd love to
> hear what Twitter & Facebook's boards (there I am again with that whole
> pesky 'business' thing again) had to say about that?  And I'm assuming that
> the same thing might just happen to a tech team that chose to spend valuable
> cycles on evaluating/implementing Cassandra only to change course - they'd
> have to explain that as well.  And then they'd hear something like, "Dudes,
> you did what?  Even Facebook & Twitter decided not to use Cassandra that
> way!"  This is not as far fetched as it sounds.  Someone on my advisory
> board asked me a very similar question about our use of Cassandra and given
> the recent news, whether or not that impacted our plans.
>
> And I'm assuming that if you're going to frantically wave arms with "SQL
> costs 5x more and you can't do that multi-DC..." that you've got something
> to back that up?  'Cuz Facebook is using a SQL store, they're using it
> multi-DC, and they're running on commodity hardware, right?
>
>
>
>  And then there's support - internal support.  Picking a database du-jour is
> organizationally expensive.  Especially when there's probably one or two
> databases that Twitter could have bought off the shelf that would have
> solved their problems.
>
>
>  You have no idea what their actual problems are and are merely
> engaging in the favorite game of HN and similar venues: armchair
> engineering.
>
>
>
>  Sure I do.  But from a business perspective.  Their architecture doesn't
> scale right now very well.  They're running with reduced API limits and you
> still get the 'fail whale' more than occasionally.  People lose followers.
> People lose tweets.  Privacy has been compromised.  Need I go on?  All of
> this would make me, as a potential customer of Twitter, as a question, "So,
> what's up with the scalability thing?  What happens if I miss a critical
> time window with my sponsored Tweets?  Do I get that $ back?, I didn't get
> 'imprints' but the opportunity is gone."  But you're right, from an
> engineering point of view, I have no idea what their problems are.  I do
> know that Cassandra was supposed to fix some of them, and now it's not and I
> don't know anything about that from an engineering point of view either.
>
> Also, I have no idea of what 'HN or similar venues' refers to.
>
>  b
>
>
>

Mime
View raw message