hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: hbase versus cassandra
Date Mon, 23 Nov 2009 20:09:50 GMT
Ah the classic.  Well since you're on the HBase list, my suggestion is
going to have to be "use HBase".  There are other advantages to HBase
over cassandra:

- atomic row changes
- row locking
- increment value operation
- strong local consistency
- multiple versioning
- no possibility of corrupted data due to normal operations
- hbase is faster! read and write
- more flexible clustering strategy - you CAN grow a HBase cluster 2x,
4x, 10x instantly.

So it really isnt just "hadoop + caching".  There is much more here,
and there are some significant and difficult to describe downsides to
the Cassandra model.  If you peruse their mailing list you will see
phrases like "pick your tokens carefully" and "the order partitioner
doesnt evenly load all boxes" etc.  You have to manage your keyspace
very carefully with cassandra, whereas with hbase the major concern is
to not have a key hotspot (eg: always appending with timestamp).

Another way to decide in the absence of information is to look at the
underlying models, bigtable vs dynamo.  Dynamo is used in the shopping
cart at Amazon and _nothing else_.  Bigtable is used by nearly every
Google product and drives Google App Engine. A recent presentation
said the largest Bigtable instance was 40 PB.  The dynamo paper said
there were scaling problems at a few hundred nodes (gossip breaks
down).

I strongly believe that the bigtable model is more flexible, more
suitable for more purposes and generally more scalable than the dynamo
model.  The evidence is pale and stark.

One last note, it seems that most Cassandra installations tend to use
it for really only 1 purpose and that is it.  Take Facebook, I have
not heard they have expanded the use of Cassandra beyond inbox search.
If you aren't growing, you're dying.

-ryan

On Mon, Nov 23, 2009 at 11:56 AM, Tim Robertson
<timrobertson100@gmail.com> wrote:
> Hi Adam,
>
> I am not the person to answer having not used Cassandra, but have
> spotted this being discussed on the list recently on a long thread:
>
> Search for "Cassandra vs HBase" on this page:
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/thread
>
> There is also an article:
> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
>
> Hope this helps with your background reading.
>
> Cheers,
> Tim
>
>
>
>
>
> On Mon, Nov 23, 2009 at 8:34 PM, Adam Fisk <a@littleshoot.org> wrote:
>> Hi Everyone- I'm implementing a new data layer and am struggling to
>> decide between HBase and Cassandra. The primary advantages of HBase as
>> far as I can tell are:
>>
>> 1) Tighter integration with Hadoop, making it easier to run M/R for
>> reporting and analytics
>> 2) Better caching layer
>>
>> Cassandra's thrift API seems a little more fleshed out to me, and
>> Facebook and Twitter give it a strong stamp of approval.
>>
>> Read performance is a major concern in our case. Can anyone lend a
>> hand in this debate? It seems difficult to me because there are likely
>> few people who have done significant implementations in both, but any
>> help is much appreciated.
>>
>> Thanks so much.
>>
>> -Adam
>>
>> --
>> Adam Fisk
>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>> http://twitter.com/adamfisk
>>
>

Mime
View raw message