Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95C579525 for ; Mon, 25 Feb 2013 21:14:14 +0000 (UTC) Received: (qmail 56396 invoked by uid 500); 25 Feb 2013 21:14:14 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 56373 invoked by uid 500); 25 Feb 2013 21:14:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 56326 invoked by uid 99); 25 Feb 2013 21:14:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 21:14:14 +0000 Date: Mon, 25 Feb 2013 21:14:14 +0000 (UTC) From: "Cristian Opris (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5062) Support CAS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586290#comment-13586290 ] Cristian Opris commented on CASSANDRA-5062: ------------------------------------------- This shouldn't be too complicated with Paxos leader election very similar to Spinnaker I don't think it requires changing the read/write paths at the lower level, at least not significantly. Assume for the sake of simplicity that we use a column prefix to encode the version The leader elected should always be the one that has the latest version. This allows the leader to perform read-modify-write (conditional update) locally and do a simple quorum write to propagate that if successful. The leader can also increment the version sequentially. Conflicting writes from other replicas cannot succeed because any node that wants to write needs to get itself elected reader first. Since we do quorum writes not all replicas will have the full sequence of versions but regular anti-entropy (read-repair) on quorum reads should take care of that. If the leader fails the newly elected leader necessarily will be the one that has the latest write so it can continue to do cas locally. Anti-entropy should also take care of recovery and catch-up of a replica just like now. I believe this can all be done on top of existing functionality without major changes to read/write paths You could also reuse the Zab algorithm from ZK for expediency without using having to use the entire ZK codebase. > Support CAS > ----------- > > Key: CASSANDRA-5062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5062 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Fix For: 2.0 > > > "Strong" consistency is not enough to prevent race conditions. The classic example is user account creation: we want to ensure usernames are unique, so we only want to signal account creation success if nobody else has created the account yet. But naive read-then-write allows clients to race and both think they have a green light to create. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira