Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 38880 invoked from network); 22 Nov 2010 22:39:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Nov 2010 22:39:54 -0000 Received: (qmail 92466 invoked by uid 500); 22 Nov 2010 22:40:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92435 invoked by uid 500); 22 Nov 2010 22:40:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92426 invoked by uid 99); 22 Nov 2010 22:40:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 22:40:24 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 22:40:17 +0000 Received: by fxm3 with SMTP id 3so5324960fxm.31 for ; Mon, 22 Nov 2010 14:39:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=cxYeTHFIQDjpeB/MLP55dmjQGQrqiMK9HUXDQrrSKfo=; b=RDcTwAA3WG9/cs7na8RrvrS+hPCwihkpYFXS2RIlXRtIGarjaNjg4Ew9Tsy/EVAqtv ASc6OtXHP/QXmNCl+GDpfob7nE7qBbfV4OAc7imtxg3DqM0sIDUi5YYcUPLN60p6Kiip o3k9mgwAAjFi6k7cH7KT4iqeq8rHAPQjUpaXA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Ti67u4XPzqz4WTXaB8YgOOY2Q27Y9RAemAh2OpdcnfAvenS+yJ2P+G7tRg9hYTfVjm mD+R2kMCAXRghcrZ8n2Aqkj73pN2c1ybqwTfv7s70YphjijIEfgj6LbDjo+wL/C8dXtM oK++R8WR/jZ1115gzxb/UC/vZEmrsRr0jiQTg= MIME-Version: 1.0 Received: by 10.223.79.7 with SMTP id n7mr5786622fak.33.1290465597157; Mon, 22 Nov 2010 14:39:57 -0800 (PST) Received: by 10.223.21.21 with HTTP; Mon, 22 Nov 2010 14:39:57 -0800 (PST) In-Reply-To: References: Date: Mon, 22 Nov 2010 17:39:57 -0500 Message-ID: Subject: Re: cassandra vs hbase summary (was facebook messaging) From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon wrote: > On Mon, Nov 22, 2010 at 1:58 PM, David Jeske wrote: >> >> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote: >>> >>> Not quite. The replica synchronization code is pretty messy, but >>> basically it will take the longest replica that may have been synced, not a >>> quorum. >>> i.e the guarantee is that "if you successfully sync() data, it will be >>> present after replica synchronization". Unsynced data *may* be present after >>> replica synchronization. >>> But keep in mind that recovery is blocking in most cases - ie if the RS >>> is writing to a pipeline and waiting on acks, and one of the nodes in the >>> pipeline dies, then it will recover the pipeline (without the dead node) and >>> continue syncing to the remaining two nodes. The client is still blocked at >>> this point. >> >> I see. So it sounds like my statement #1 was wrong. Will the RS ever >> timeout the write and fail in the face of not being able to push it to HDFS? >> Is it correct to say: >> Once a write is issued to HBase, it will either catistrophicly fail (i.e. >> disconnect), in which case the write with either have failed or succeeded, >> and if it succeeded, future reads will always show that write? As opposed to >> Cassandra, which in all configurations where reads allow a subset of all >> nodes, can "fail" a write while having the write show a temporary period of >> inconsistency (depending on who you talk to) followed by the write either >> applying or not applying depending on whether or not it actually wrote a >> single node during the "failure to meet the write consistency request"? > > Yes, this seems accurate to me. > >> >> Does Cassandra have any return result which distinguishes between these >> two states: >> 1 - your data was not written to any nodes (true failure) >> 2 - your data was written to at least 1 node, but not enough to meet your >> write-consistency count >> ? >> >> > > David , Return messages such as "your data was written to at least 1 node but not enough to make your write-consistency count". Do not help the situation. As the client that writes the data would be aware of the inconsistency, but the other clients would not. Thus it only makes sense to pass or fail entirely. (Thought it could be an interesting error message) Right, CASSANDRA-1314 only solves the memory overhead issue. Another twist to throw in the "losing writes conversation" is that file systems can lose writes as well :) Unless you are choosing many synchronous options that most people do not use (IMHO) @Todd. Good catch about caching HFile blocks. My point still applies though. Caching HFIle blocks on a single node vs individual "dataums" on N nodes may not be more efficient. Thus terms like "Slower" and "Less Efficient" could be very misleading. Isn't caching only the item more efficient? In cases with high random read is evicting single keys more efficient then evicting blocks in terms of memory churn? These are difficult questions to answer absolutely so seeing bullet points such as '#Cassandra has slower this' are oversimplifications of complex problems.