hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fisk...@littleshoot.org>
Subject Re: hbase versus cassandra
Date Mon, 23 Nov 2009 22:09:56 GMT
Thanks guys - super helpful. My background is in p2p, but I adhere to
Martin Fowler's "First Law of Distributed Object Design" wherever
possible - Don’t distribute your objects! The timestamp trick for
avoiding hotspots makes a lot of sense, and it's tough to argue with
"hbase is faster," as I generally prefer faster.

I'm surprised HBase is faster for writes given Cassandra's eventual
consistency model. Can anyone explain why? Is it because HBase somehow
knows where data has been replicated to, and just sends the queries to
those nodes?

It's extremely exciting both projects exist at all, and thanks for all
your hard work. Depending on which route we go, I might be piping up
on the list much more often.

Thanks again.

-Adam


On Mon, Nov 23, 2009 at 12:09 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> Ah the classic.  Well since you're on the HBase list, my suggestion is
> going to have to be "use HBase".  There are other advantages to HBase
> over cassandra:
>
> - atomic row changes
> - row locking
> - increment value operation
> - strong local consistency
> - multiple versioning
> - no possibility of corrupted data due to normal operations
> - hbase is faster! read and write
> - more flexible clustering strategy - you CAN grow a HBase cluster 2x,
> 4x, 10x instantly.
>
> So it really isnt just "hadoop + caching".  There is much more here,
> and there are some significant and difficult to describe downsides to
> the Cassandra model.  If you peruse their mailing list you will see
> phrases like "pick your tokens carefully" and "the order partitioner
> doesnt evenly load all boxes" etc.  You have to manage your keyspace
> very carefully with cassandra, whereas with hbase the major concern is
> to not have a key hotspot (eg: always appending with timestamp).
>
> Another way to decide in the absence of information is to look at the
> underlying models, bigtable vs dynamo.  Dynamo is used in the shopping
> cart at Amazon and _nothing else_.  Bigtable is used by nearly every
> Google product and drives Google App Engine. A recent presentation
> said the largest Bigtable instance was 40 PB.  The dynamo paper said
> there were scaling problems at a few hundred nodes (gossip breaks
> down).
>
> I strongly believe that the bigtable model is more flexible, more
> suitable for more purposes and generally more scalable than the dynamo
> model.  The evidence is pale and stark.
>
> One last note, it seems that most Cassandra installations tend to use
> it for really only 1 purpose and that is it.  Take Facebook, I have
> not heard they have expanded the use of Cassandra beyond inbox search.
> If you aren't growing, you're dying.
>
> -ryan
>
> On Mon, Nov 23, 2009 at 11:56 AM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
>> Hi Adam,
>>
>> I am not the person to answer having not used Cassandra, but have
>> spotted this being discussed on the list recently on a long thread:
>>
>> Search for "Cassandra vs HBase" on this page:
>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/thread
>>
>> There is also an article:
>> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
>>
>> Hope this helps with your background reading.
>>
>> Cheers,
>> Tim
>>
>>
>>
>>
>>
>> On Mon, Nov 23, 2009 at 8:34 PM, Adam Fisk <a@littleshoot.org> wrote:
>>> Hi Everyone- I'm implementing a new data layer and am struggling to
>>> decide between HBase and Cassandra. The primary advantages of HBase as
>>> far as I can tell are:
>>>
>>> 1) Tighter integration with Hadoop, making it easier to run M/R for
>>> reporting and analytics
>>> 2) Better caching layer
>>>
>>> Cassandra's thrift API seems a little more fleshed out to me, and
>>> Facebook and Twitter give it a strong stamp of approval.
>>>
>>> Read performance is a major concern in our case. Can anyone lend a
>>> hand in this debate? It seems difficult to me because there are likely
>>> few people who have done significant implementations in both, but any
>>> help is much appreciated.
>>>
>>> Thanks so much.
>>>
>>> -Adam
>>>
>>> --
>>> Adam Fisk
>>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>>> http://twitter.com/adamfisk
>>>
>>
>



-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk

Mime
View raw message