hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cassandra vs HBase
Date Tue, 01 Sep 2009 22:12:21 GMT
> They have aspects in common -- java, datastores, apache -- but the
> differences are pretty acute:

This is a pretty fair summary, IMO.

> + Cassandra does eventual consistency.  HBase does strong consistency.  See
> http://devblog.streamy.com/2009/08/24/cap-theorem/ for more on this.

As I wrote in the other email, you can still get strong consistency
with Cassandra.  (But, you can't get row locking: that is a definite
win for HBase.  In my experience though most apps need locking less
than they think.)

The big win for Cassandra is that its p2p distribution model -- which
drives the consistency model -- means there is no single point of
failure.  SPF can be mitigated by failover but it's really, really
hard to get all the corner cases right with that approach.  Even
Google with their 3 year head start and huge engineering resources
still has trouble with that occasionally.  (See e.g.
http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179.)

> + Cassandra does not have have a natural sharding notion as there is in
> HBase -- i.e. HBase Regions -- so hooking Cassandra to MapReduce is awkward.

Actually that's not a big deal -- the token ring is known, so you can
break up at a coarse granularity there, and each node has a sampling
of the keys stored on it thanks to the way the sstable indexing works,
so generating hadoop input regions is pretty easy.  Jeff Hodges wrote
a proof of concept over at
https://issues.apache.org/jira/browse/CASSANDRA-342.

> + The Cassandra fellas talk of their app being one ball of code only whereas
> with HBase there is HDFS, ZooKeeper and then HBase itself (Apparently it has
> less lines of code too).

Opinions may differ, but I still think this is a huge win for troubleshooting.

> Less tangible differences -- or differences that can be addressed through
> application and development -- would include community, maturity, number and
> variety of production installs, and features (monitoring, shells, UIs, admin
> tools, etc.).  On these latter dimensions, HBase would seem to do better but
> do the research and make your own call.

I agree that HBase does better on some of these metrics right now, but
I also think Cassandra is accelerating faster. :)

-Jonathan (cassandra committer)

Mime
View raw message