Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB3D9877C for ; Wed, 7 Sep 2011 18:49:41 +0000 (UTC) Received: (qmail 55364 invoked by uid 500); 7 Sep 2011 18:49:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 55305 invoked by uid 500); 7 Sep 2011 18:49:39 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 55297 invoked by uid 99); 7 Sep 2011 18:49:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 18:49:39 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.125.168.128] (HELO bilbo.syminet.com) (206.125.168.128) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 18:49:31 +0000 Received: from [59.92.163.67] (helo=aa) by bilbo.syminet.com with esmtpa (Exim 4.72) (envelope-from ) id 1R1NBL-0008Jc-Uo for user@hbase.apache.org; Wed, 07 Sep 2011 11:49:08 -0700 Date: Thu, 8 Sep 2011 00:19:07 +0530 From: Arvind Jayaprakash To: user@hbase.apache.org Subject: Re: HBase Vs CitrusLeaf? Message-ID: <20110907184907.GB3203@aa> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Antiabuse: This header was added to track abuse, please include it with any abuse report X-Antiabuse: Primary Hostname - bilbo.syminet.com X-Antiabuse: Original Domain - hbase.apache.org X-Antiabuse: Originator/Caller UID/GID - [102 102] / [102 102] X-Antiabuse: Sender Address Domain - anomalizer.net On Sep 06, Something Something wrote: >Anyway, before I spent a lot of time on it, I thought I should check if >anyone has compared HBase against CitrusLeaf. If you've, I would greatly >appreciate it if you would share your experiences. Disclaimer: I was an early evaluator/tester of citrusleaf about a year ago when it was in its infancy. Though I am not affliated with them in any manner, I might be more benevolent to them than most readers of this mailing list. The short answer is that hbase & citrusleaf (called CL in remainder of the mail) are very different products. CL cares a lot more about predictable latencies than hbase does. This is manifested in two aspects of the design: * It is heavily optimized for large RAM + SSD usage. While hbase does a fair job of using RAM, I can say for sure that both the throughput and latency trends is much better with CL in cases where spinning disks are not used directly in the readwrite path. * Multiple machines can concurrently/actively handle requests for the same key, so the loss of one server does not mean that a range of keys is temporarily unavailable. A hbase cluster does have a partial, temporary outage when a region server dies. Things don't get back to normal immediately even when a new server takes over since not all region data may now be local disk reads. Even if they are, it won't be readily waiting for you in fast memory. * A third aspect that is more of a side-effect is that HDFS still has a SPOF in form the namenode does continue to be a cause for concern wrt overall uptime guarantees Here is where hbase would do much better: * It is designed for much larger data to the point where it is natural for the entire dataset to much larger than the total available RAM and the usage of hard disks as the primary storage medium is natural. * A bigtable implementation is also designed for both ranged scans and also full table scans. Last I recall, CL was more of a DHT and so ranged scans is infeasible and doing full scans would qualify as much more than shooting oneself in the foot. And here is where hbase has advantages in principle: * As others mentioned, there are "textbook" advantages of using an open source solution. * hbase definitely has run both longer and on larger clusters than CL possibly has. While generalizations are dangerous, the one place when C++ code could shine over java (JVM really) is one does not have to fight the GC. I'd personally be more confomtable with handing off say 48GB of memory to a good C/C++ code than the JVM. That being said, the folks working on hbase have been actively been addressing this problem to the extent possible in pure java by using unmanaged heap memory. Search for "mslab hbase" to learn more about it. My conclusion is that the two products address different problem spaces. So I'd urge you to spend time understanding your access patterns and see which one does it map to more closely. Feel free to contact me off list if you feel the need to ask anything that is not approrpiate for the mailing list but is relevant to this discussion.