Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 14851 invoked from network); 23 Nov 2010 00:50:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 00:50:39 -0000 Received: (qmail 31592 invoked by uid 500); 23 Nov 2010 00:51:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 31571 invoked by uid 500); 23 Nov 2010 00:51:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 31563 invoked by uid 99); 23 Nov 2010 00:51:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 00:51:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davidj@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 00:51:01 +0000 Received: by wwa36 with SMTP id 36so8250229wwa.25 for ; Mon, 22 Nov 2010 16:50:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=QdXXWTS9jhbnbeMzKMgZUvW8N6SO4bcruCgUF2pCq0w=; b=iypI5iTNdxu4da4Xp9lOj1DLHpR2if1X31h7NaiGW03lsvwDu3Z7OENs3Xx9f7YbXy 2lH/AOCNfcccGEYkOmQaKzApoTb02oVa4RYXnYBLzpBtCUlA9rrIZCok3AIpbXoDyWT2 HC2TqlztQ9cgUcqFGQeFMjPP0pRjmT8sHrcag= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=NuxTy2PZks4wWA/7d90ddLUkmB7tUBJpPVqtKZPJeWPjecMB6ySqS9Rt5yzqQaPRec BSM2HrsxJgI7numCIHC5pbkYMEMlPDndCnZy7AShG9dr9eUcL8Vvg8h4KaEfEMbSNe1E MjJpEfcZR1o+510jySN04SQaQIxcLk8UPheEQ= MIME-Version: 1.0 Received: by 10.216.90.132 with SMTP id e4mr5695728wef.73.1290473440707; Mon, 22 Nov 2010 16:50:40 -0800 (PST) Received: by 10.216.240.70 with HTTP; Mon, 22 Nov 2010 16:50:40 -0800 (PST) In-Reply-To: References: Date: Mon, 22 Nov 2010 16:50:40 -0800 Message-ID: Subject: Re: cassandra vs hbase summary (was facebook messaging) From: David Jeske To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6dab1780683c90495adc3dc X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dab1780683c90495adc3dc Content-Type: text/plain; charset=ISO-8859-1 This is my second attempt at a summary of Cassandra vs HBase consistency and performance for an hbase acceptable workload. I think these tricky subtlties are hard to understand, yet it's helpful for the community to understand them. I'm not trying to state my own facts (or opinion) but merely summarize what I've read. Again, please correct any facts which are wrong. Thanks for the kind and thoughtful responses! *1) Cassandra can't replicate the consistency situation of HBase.* Namely that once a write is finished that new value will either always appear or never appear. [In Cassandra]Provided at least one node receives the write, it will eventually be written to all replicas. A failure to meet the requested ConsistencyLevel is just that; not a failure to write the data itself. Once the write is received by a node, it will eventually reach all replicas, there is no roll back. - Nick Telford [ref ] In Cassandra (N3/W3/R1, N3/W2/R2, or N3/W3/R3), a write can occur to a single node, fail to meet the write-consistency request, readback can show the old value, but later show the new value once the write that did occur is propagated. [In HBase]Once a region master accepts a write, it has been flushed to the HDFS log. If the replica server goes down while writing, if the write was finished to any copies of the HDFS log, the new region master will accept and propagate the write, if not, the write will never appear. *2) Cassandra has a less efficient use of memory, particularly for data pinned in memory. *With 3 replicas on Cassandra, each element of data pinned in-memory is kept on 3 servers, wheras in hbase only region masters keep the data in memory, so there is only one-copy of each data element. CASSANDRA-1314 provides an opportunity to allow a 'soft master', where reads prefer a particular replica. Combined with a disable of read-repair this should allow for more efficient memory usage for data pinned or cached in memory. #1 is still true, namely that a write may only occur to a node which is not the soft-master, and that new new value may not appear for a while and then eventually appear. However, with N3/W3/R1, once a write appears at the soft-master it will remain, so as long as the soft-master preference can be honored it will be closer to HBase's consistency. *3) HBase can't match the row-availability situation of Cassandra (N3/W2/R2).* In the face of a single machine failure, if it is a region master, those keys are offline in HBase until a new region master is elected and brought online. In Cassandra, no single node failure causes the data to become unavailable. *4) Two Cassandra configurations are closest to the **consistency situation of hbase, and provide slightly different node failure characteristics.*(note, #1 above means Cassandra can't truly reach the same consistency situation as HBase) In Cassandra (N3/W3/R1), a node failure will disallow writes to a keyrange during the replica rebuild, while still allowing reads. In Cassandra (N3/W2-3/R2), a node failure will allow both reads and writes to continue, while requiring uncached reads to contact two servers. (Requiring a response from two servers may increase common case latency, but may hide latency from GC spikes, since any two of the three may respond) In HBase, if an HDFS node fails, both reads and writes continue; while when a region-master fails, both reads and writes are stalled until the region master is replaced. Was that a better summary? Is it closer to correct? --0016e6dab1780683c90495adc3dc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
This is my second attempt at a summary of Cassan= dra vs HBase consistency and performance for an hbase acceptable workload. = I think these tricky subtlties are hard to understand, yet it's helpful= for the community to understand them. I'm not trying to state my own f= acts (or opinion) but merely summarize what I've read.

Again, plea= se correct any facts which are wrong. Thanks for the kind and thoughtful re= sponses!=A0

1) Cassandra can't replicate the consistency situation of HBase.= Namely that once a write is finished that new value will either always app= ear or never appear.=A0

[In Cassandra]Provided at leas= t one node receives the write, it will eventually be written to all replica= s. A failure to meet the requested ConsistencyLevel is just that; not a fai= lure to write the data itself. Once the write is received by a node, it wil= l eventually reach all replicas, there is no roll back. - Nick Telford [ref]
In Cassandra (N3/W3/R1, N3/W2/R2, or N3/W3/R3), a write can occu= r to a single node, fail to meet the write-consistency request, readback ca= n show the old value, but later show the new value once the write that did = occur is propagated.
[In HBase]Once a region master accepts a write, it has been flus= hed to the HDFS log. If the replica server goes down while writing, if the = write was finished to any copies of the HDFS log, the new region master wil= l accept and propagate the write, if not, the write will never appear.=A0
2) Cassandra has a less efficient use of memory, particularly= for data pinned in memory. With 3 replicas on Cassandra, each element = of data pinned in-memory is kept on 3 servers, wheras in hbase only region = masters keep the data in memory, so there is only one-copy of each data ele= ment.=A0
CASSANDRA-1314 provides an opportunity to allow a 'soft master'= ;, where reads prefer a particular replica. Combined with a disable of read= -repair this should allow for more efficient memory usage for data pinned o= r cached in memory. #1 is still true, namely that a write may only occur to= a node which is not the soft-master, and that new new value may not appear= for a while and then eventually appear. However, with N3/W3/R1, once a wri= te appears at the soft-master it will remain, so as long as the soft-master= preference can be honored it will be closer to HBase's consistency.=A0=
3) HBase can't match the row-availability situa= tion of Cassandra (N3/W2/R2). In the face of a single machine failure, = if it is a region master, those keys are offline in HBase until a new regio= n master is elected and brought online. In Cassandra, no single node failur= e causes the data to become unavailable.=A0

4) Two Cassandra configurations ar= e closest to the=A0consist= ency situation of hbase, and provide slightly different node failure charac= teristics. (note, #1 above means Cassandra can't truly reach the sa= me consistency situation as HBase)

In Cassandra (N3/W3/R1)= , a node failure will disallow writes to a keyrange during the replica rebu= ild, while still allowing reads.
In Cassandra (N3/W2-= 3/R2), a node failure will allow both reads and writes to continue, while r= equiring uncached reads to contact two servers. (Requiring a response from = two servers may increase common case latency, but may hide latency from GC = spikes, since any two of the three may respond)
In HBase, if an HDFS= node fails, both reads and writes continue; while when a region-master fai= ls, both reads and writes are stalled until the region master is replaced.= =A0


Was that a better summary? Is it closer to corre= ct?




--0016e6dab1780683c90495adc3dc--