Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D02B8809 for ; Tue, 30 Aug 2011 11:29:58 +0000 (UTC) Received: (qmail 73356 invoked by uid 500); 30 Aug 2011 11:29:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 73096 invoked by uid 500); 30 Aug 2011 11:29:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 73087 invoked by uid 99); 30 Aug 2011 11:29:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Aug 2011 11:29:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bernd.fondermann@googlemail.com designates 209.85.218.41 as permitted sender) Received: from [209.85.218.41] (HELO mail-yi0-f41.google.com) (209.85.218.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Aug 2011 11:29:46 +0000 Received: by yib2 with SMTP id 2so5059516yib.14 for ; Tue, 30 Aug 2011 04:29:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=ou4q1UQPIJAsHhNgfLohpeBWW6BENBUxubZncCLFnso=; b=uJHrRss+eAvg6LU70L5A9hHfDS3r1rXdjY5zv/iMK9l8FvESkKBlRPy4Oq1lV56j5c b2ZdREUr9Fiq64UY2e6fa3IZhtpxbQ0aL6CunnZHDEopbkZUsPJIIaqaSrXBR9JIslZH 5KX2kIIJiyC1bHhOy1jCfJ3D0lzOVWXeHZik0= MIME-Version: 1.0 Received: by 10.42.154.136 with SMTP id q8mr6744274icw.109.1314703765858; Tue, 30 Aug 2011 04:29:25 -0700 (PDT) Received: by 10.42.243.137 with HTTP; Tue, 30 Aug 2011 04:29:25 -0700 (PDT) In-Reply-To: <1314697671.83999.YahooMailNeo@web65509.mail.ac4.yahoo.com> References: <11B3066F-7AB8-41ED-BBA4-EFD0F9EE5463@email.com> <1314697671.83999.YahooMailNeo@web65509.mail.ac4.yahoo.com> Date: Tue, 30 Aug 2011 13:29:25 +0200 Message-ID: Subject: Re: HBase and Cassandra on StackOverflow From: Bernd Fondermann To: user@hbase.apache.org, Andrew Purtell Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Aug 30, 2011 at 11:47, Andrew Purtell wrote: > Hi Chris, > > Appreciate your answer on the post. > > Personally speaking however the endless Cassandra vs. HBase discussion is= tiresome and rarely do blog posts or emails in this regard shed any light.= Often, Cassandra proponents mis-state their case out of ignorance of HBase= or due to commercial or personal agendas. It is difficult to find clear ey= ed analysis among the partisans. I'm not sure it will make any difference p= osting a rebuttal to some random thing jbellis says. Better to focus on imp= roving HBase than play whack a mole. > > > Regarding some of the specific points in that post: > > HBase is proven in production deployments larger than the largest publicl= y reported Cassandra cluster, ~1K versus 400 or 700 or somesuch. But basica= lly this is the same order of magnitude, with HBase having a slight edge. I= don't see a meaningful difference here. Stating otherwise is false. > > HBase supports replication between clusters (i.e. data centers). I believ= e, but admit I'm not super familiar with the Cassandra option here, that th= e main difference is HBase provides simple mechanism and the user must buil= d a replication architecture useful for them; while Cassandra attempts to h= ide some of that complexity. I do not know if they succeed there, but large= scale cross data center replication is rarely one size fits all so I doubt= it. > > Cassandra does not have strong consistency in the sense that HBase provid= es. It can provide strong consistency, but at the cost of failing any read = if there is insufficient quorum. HBase/HDFS does not have that limitation. = On the other hand, HBase has its own and different scenarios where data may= not be immediately available. The differences between the systems are nuan= ced and which to use depends on the use case requirements. > > Cassandra's RandomPartitioner / hash based partitioning means efficient M= apReduce or table scanning is not possible, whereas HBase's distributed ord= ered tree is naturally efficient for such use cases, I believe explaining w= hy Hadoop users often prefer it. This may or may not be a problem for any g= iven use case. Using an ordered partitioner with Cassandra used to require = frequent manual rebalancing to avoid blowing up nodes. I don't know if more= recent versions still have this mis-feature. > > Cassandra is no less complex than HBase. All of this complexity is "hidde= n" in the sense that with Hadoop/HBase the layering is obvious -- HDFS, HBa= se, etc. -- but the Cassandra internals are no less layered. An impartial a= nalysis of implementation and algorithms will reveal that Cassandra's theor= y of operation in its full detail is substantially more complex. Compare th= e BigTable and Dynamo papers and this is clear. There are actually more opp= ortunities for something to go wrong with Cassandra. > > While we are looking at codebases, it should be noted that HBase has subs= tantially more unit tests. > > With Cassandra, all RPC is via Thrift with various wrappers, so actually = all Cassandra clients are second class in the sense that jbellis means when= he states "Non-Java clients are not second-class citizens". > > The master-slave versus peer-to-peer argument is larger than Cassandra vs= . HBase, and not nearly as one sided as claimed. The famous (infamous?) glo= bal failure of Amazon's S3 in 2008, a fully peer-to-peer system, due to a s= ingle flipped bit in a gossip message demonstrates how in peer to peer syst= ems every node can be a single point of failure. There is no obvious winner= , instead, a series of trade offs. Claiming otherwise is intellectually dis= honest. Master-slave architectures seem easier to operate and reason about = in my experience. Of course, I'm partial there. > > I have just scratched the surface. +1, insightful. Thanks for posting this. Bernd