Return-Path: X-Original-To: apmail-cassandra-client-dev-archive@minotaur.apache.org Delivered-To: apmail-cassandra-client-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC2E4E791 for ; Thu, 31 Jan 2013 15:25:05 +0000 (UTC) Received: (qmail 26587 invoked by uid 500); 31 Jan 2013 15:25:05 -0000 Delivered-To: apmail-cassandra-client-dev-archive@cassandra.apache.org Received: (qmail 26569 invoked by uid 500); 31 Jan 2013 15:25:05 -0000 Mailing-List: contact client-dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: client-dev@cassandra.apache.org Delivered-To: mailing list client-dev@cassandra.apache.org Received: (qmail 26547 invoked by uid 99); 31 Jan 2013 15:25:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 15:25:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dgodas@gmail.com designates 209.85.212.181 as permitted sender) Received: from [209.85.212.181] (HELO mail-wi0-f181.google.com) (209.85.212.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 15:25:00 +0000 Received: by mail-wi0-f181.google.com with SMTP id c10so2918127wiw.8 for ; Thu, 31 Jan 2013 07:24:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=qagn2Z90RFeAImjAwsYxTsVrLoLI+hRGgi2n+6T/sbA=; b=YaBtZAu3dBn8XOh0tLIQu4ut56KoqVToo4mV2A3IQeFhxSXMLI0iI9qFZy3CTuXNhB qYyYmqwjJlsDmFdQkUdZ56HX8AWcupY7+ezQ3EnqID9BcUwXuV1ionPL9aPuUmknU3Uq Bo5PfpizOIT62FNTdaI0i5gq/Kehw6vs3s72b5wWNs8HKbf3Urhhhup/gZYzVEsKDpG5 SZa5Yl1evRu/qFj45uZDfgR386bvkMBVtjR+mB6GkFK1C1OWswcr4d10AUc/qZXp1a9D 4r0LlxvZIZMxC1pECaKOKuD9yaBD5dq86xsbTupA72o5++CuCj/2kiFp3PcBNYCCS4Vr A4+g== MIME-Version: 1.0 X-Received: by 10.194.108.229 with SMTP id hn5mr16388518wjb.8.1359645879320; Thu, 31 Jan 2013 07:24:39 -0800 (PST) Received: by 10.194.23.37 with HTTP; Thu, 31 Jan 2013 07:24:39 -0800 (PST) In-Reply-To: <3F707B49-26DF-4CA4-8909-8F6CCE308FE7@liquidanalytics.com> References: <0EA07E80-3F88-42D0-BEF6-F0CFB3E93596@liquidanalytics.com> <54866A25-BBAC-4426-A42F-B755230A4164@liquidanalytics.com> <02DC28DD-8AB6-4DEA-AA6C-6BE2CB507C2B@liquidanalytics.com> <3F707B49-26DF-4CA4-8909-8F6CCE308FE7@liquidanalytics.com> Date: Thu, 31 Jan 2013 15:24:39 +0000 Message-ID: Subject: Re: Implementing a locking mechanism - RFC From: =?ISO-8859-1?Q?Daniel_God=E1s?= To: client-dev@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org This sounds reasonable. Thanks for the guidance, I think I just cut down this project's development time by half... 2013/1/31 Oleg Dulin : > QUORUM means that (RF/2+1) nodes must respond. See this calculator: > > http://www.ecyrd.com/cassandracalculator/ > > If you use RF=3D2 and you have 4 nodes you cannot survive outage of any n= ode if you use QUORUM. > > So you are introducing a point of failure. > > If you need to do reads before writes I strongly suggest a caching mechan= ism, and if you want to do locking I suggest queues. I wouldn't send CQL st= atements on queues though, but that works. I would send Java business objec= ts, but that's my personal preference. > > If you are frequently updating and reading the same data, that's an anti-= pattern in Cassandra. Reads will be competing for resources with compaction= s, etc. And of course, unless you use QUORUM or ALL you really shouldn't re= ly on a read-before-write. > > > Regards, > Oleg Dulin > Please note my new office #: 732-917-0159 > > On Jan 31, 2013, at 9:35 AM, Daniel God=E1s wrote: > >> Can I not get away with it using QUORUM? Correct me if I'm wrong but I >> think the only way I can get a read consistency issue using QUORUM is >> if 2 nodes fail after a write and before the TTL expires. I can live >> with the overhead of using QUORUM for the locking operations as they >> won't be used often. >> >> 2013/1/31 Oleg Dulin : >>> The problem with using Cassandra for locking is that if you ever have r= ead consistency issues (which you will unless you use ALL) you will have in= consistent values. >>> >>> In general I would avoid doing a read-before-write with Cassandra. I wo= uld come up with another way to update my data. >>> >>> Regards, >>> Oleg Dulin >>> Please note my new office #: 732-917-0159 >>> >>> On Jan 31, 2013, at 9:19 AM, Daniel God=E1s wrote: >>> >>>> Ok, I've done some reading. If I understood it correctly the idea >>>> would be to send messages to the queue that contain a transaction i.e. >>>> a list of CQL commands to be run atomically. When one of the consumers >>>> gets the message it can run the transaction atomically before allowing >>>> another consumer to get the next message. If this is correct then in >>>> order to handle cases in which I need to interleave code with the CQL >>>> statements e.g. to check retrieved values, I need to implement a >>>> protocol that uses the message queue as a locking mechanism. How is >>>> this better than using cassandra for locking? (using the algorithm I >>>> proposed or another one). >>>> >>>> 2013/1/31 Oleg Dulin : >>>>> This may help: >>>>> >>>>> http://activemq.apache.org/how-do-distributed-queues-work.html >>>>> >>>>> http://activemq.apache.org/topologies.html >>>>> >>>>> http://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connectio= n.html >>>>> >>>>> Although I would use ActiveMQ spring configuration, not write code. B= ut the point is -- you can have multiple processes participating in an Acti= veMQ federation; you can configure AMQ's fault tolerance profiles to your l= iking without having to set up a yet another server with a single point of = failure. >>>>> >>>>> You have a single distributed queue. Each process has a writer consum= er on that queue. AMQ knows to load balance, only one consumer at a time ge= ts to write. Instead of writing to cassandra, you send your data item to th= e queue. The next available consumer gets the message and writes it -- all = in the order of messages on the queue, and only one consumer writer at a ti= me. >>>>> >>>>> Regards, >>>>> Oleg Dulin >>>>> Please note my new office #: 732-917-0159 >>>>> >>>>> On Jan 31, 2013, at 8:11 AM, Daniel God=E1s wrote: >>>>> >>>>>> Sounds good, I'll try it out. Thanks for the help. >>>>>> >>>>>> 2013/1/31 Oleg Dulin : >>>>>>> Use embedded amq brokers , no need set up any servers . It literal= ly is >>>>>>> one line of code to turn it on, and 5 lines of code to implement wh= at you >>>>>>> want. >>>>>>> >>>>>>> We have a cluster of servers writing to Cassandra this way and we a= re not >>>>>>> using any j2ee containers. >>>>>>> >>>>>>> On Thursday, January 31, 2013, Daniel God=E1s wrote: >>>>>>> >>>>>>>> Doesn't that require you to set up a server for the message queue = and >>>>>>>> know it's address? That sort of defeats the purpose of having a >>>>>>>> database like cassandra in which all nodes are equal and there's n= o >>>>>>>> single point of failure. >>>>>>>> >>>>>>>> 2013/1/31 Oleg Dulin >: >>>>>>>>> Use a JMS message queue to send objects you want to write. Your w= riter >>>>>>>> process then will listen on this queue and write to Cassandra. Thi= s ensures >>>>>>>> that all writes happen in an orderly fashion, one batch at a time. >>>>>>>>> >>>>>>>>> I suggest ActiveMQ. It is easy to set up. This is what we use for= this >>>>>>>> type of a use case. No need to overcomplicate this with Cassandra. >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Oleg Dulin >>>>>>>>> Please note my new office #: 732-917-0159 >>>>>>>>> >>>>>>>>> On Jan 31, 2013, at 6:35 AM, Daniel God=E1s > >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I need a locking mechanism on top of cassandra so that multiple >>>>>>>>>> clients can protect a critical section. I've seen some attempts, >>>>>>>>>> including Dominic Williams' wait chain algorithm but I think it = can be >>>>>>>>>> simplified. This is the procedure I wrote to implement a simple = mutex. >>>>>>>>>> Note that it hasn't been thoroughly tested and I have been using >>>>>>>>>> cassandra for a very short time so I'd appreciate any comments o= n >>>>>>>>>> obvious errors or things I'm doing plain wrong and will never wo= rk. >>>>>>>>>> >>>>>>>>>> The assumptions and requirements for the algorithm are the same = as >>>>>>>>>> Dominic Williams' >>>>>>>>>> ( >>>>>>>> http://media.fightmymonster.com/Shared/docs/Wait%20Chain%20Algorit= hm.pdf). >>>>>>>>>> >>>>>>>>>> We will create a column family for the locks referred to as "loc= ks" >>>>>>>>>> throughout this procedure. The column family contains two column= s; an >>>>>>>>>> identifier for the lock which will also be the column key ("id"= ) and >>>>>>>>>> a counter ("c"). Throughout the procedure "my_lock_id" will be u= sed as >>>>>>>>>> the lock identifier. An arbitrary time-to-live value is required= by >>>>>>>>>> the algorithm. This value will be referred to as "t". Choosing a= n >>>>>>>>>> appropriate value for "t" will be postponed until the algorithm = is >>>>>>>>>> deemed good. >>>>>>>>>> >>>>>>>>>> =3D=3D=3D begin procedure =3D=3D=3D >>>>>>>>>> >>>>>>>>>> (A) When a client needs to access the critical section the follo= wing >>>>>>>>>> steps are taken: >>>>>>>>>> >>>>>>>>>> --- begin --- >>>>>>>>>> >>>>>>>>>> 1) SELECT c FROM locks WHERE id =3D my_lock_id >>>>>>>>>> 2) if c =3D 0 try to acquire the lock (B), else don't try (C) >>>>>>>>>> >>>>>>>>>> --- end --- >>>>>>>>>> >>>>>>>>>> (B) Try to acquire the lock: >>>>>>>>>> >>>>>>>>>> --- begin --- >>>>>>>>>> >>>>>>>>>> 1) UPDATE locks USING TTL t SET c =3D c + 1 WHERE id =3D my_lock= _id >>>>>>>>>> 2) SELECT c FROM locks WHERE id =3D my_lock_id >>>>>>>>>> 3) if c =3D 1 we acquired the lock (D), else we didn't (C) >>>>>>>>>> >>>>>>>>>> --- end --- >>>>>>>>>> >>>>>>>>>> (C) Wait before re-trying: >>>>>>>>>> >>>>>>>>>> --- begin --- >>>>>>>>>> >>>>>>>>>> 1) sleep for a random time higher than t and start at (A) again >>>>>>>>>> >>>>>>>>>> --- end --- >>>>>>>>>> >>>>>>>>>> (D) Execute the critical section and release the lock: >>>>>>>>>> >>>>>>>>>> --- begin --- >>>>>>>>>> >>>>>>>>>> 1) start background thread that increments c with TTL =3D t ever= y t / 2 >>>>>>>>>> interval (UPDATE locks USING TTL t SET c =3D c + 1 WHERE id =3D >>>>>>>>>> my_lock_id) >>>>>>>>>> 2) execute the critical section >>>>>>>>>> 3) kill background thread >>>>>>>>>> 4) DELETE * FROM locks WHERE id =3D my_lock_id >>>>>>>>>> >>>>>>>>>> --- end --- >>>>>>>>>> >>>>>>>>>> =3D=3D=3D end procedure =3D=3D=3D >>>>>>>>>> >>>>>>>>>> Looking forward to read your comments, >>>>>>>>>> Dan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sent from Gmail Mobile >>>>> >>> >