Mailing-List: contact client-dev-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: client-dev@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of dgodas@gmail.com designates
 209.85.212.181 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <3F707B49-26DF-4CA4-8909-8F6CCE308FE7@liquidanalytics.com>
References: 
 <CAA--FSWiSrE+44bOkD0+W-G-4cuun0iu8d5jscTi+JxZrFfvrA@mail.gmail.com>
	<0EA07E80-3F88-42D0-BEF6-F0CFB3E93596@liquidanalytics.com>
	<CAA--FSVGxvCk1xyUrPicBjqXjLfcc+yOVDac80Y0gJac7UVhHA@mail.gmail.com>
	<CADKu+J9N+48VHqdFJW9V4kYQB2hrEHTeBuUydGHEu-i=LNGv5w@mail.gmail.com>
	<CAA--FSVECEuMqfWr5MGWbr3_HQjHcG8nC9EJkNJK7s+6YkkB3Q@mail.gmail.com>
	<54866A25-BBAC-4426-A42F-B755230A4164@liquidanalytics.com>
	<CAA--FSWotXZ-Y-MuJMOibP0Dk39n4H8wkbDpAiEMy7AbDUPsgw@mail.gmail.com>
	<02DC28DD-8AB6-4DEA-AA6C-6BE2CB507C2B@liquidanalytics.com>
	<CAA--FSWAiZRp5iGyHHPTfNV6kTnn=mxeUvsP8n=vT2gDOpYntQ@mail.gmail.com>
	<3F707B49-26DF-4CA4-8909-8F6CCE308FE7@liquidanalytics.com>
Date: Thu, 31 Jan 2013 15:24:39 +0000
Message-ID: 
 <CAA--FSUeA-EF4h1aA-dPnRvAJ111UmoNVNOJT3PN-whLSKLUBw@mail.gmail.com>
Subject: Re: Implementing a locking mechanism - RFC
From: =?ISO-8859-1?Q?Daniel_God=E1s?= <dgodas@gmail.com>
To: client-dev@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This sounds reasonable. Thanks for the guidance, I think I just cut
down this project's development time by half...

2013/1/31 Oleg Dulin <oleg.dulin@liquidanalytics.com>:
> QUORUM means that (RF/2+1) nodes must respond. See this calculator:
>
> http://www.ecyrd.com/cassandracalculator/
>
> If you use RF=3D2 and you have 4 nodes you cannot survive outage of any n=
ode if you use QUORUM.
>
> So you are introducing a point of failure.
>
> If you need to do reads before writes I strongly suggest a caching mechan=
ism, and if you want to do locking I suggest queues. I wouldn't send CQL st=
atements on queues though, but that works. I would send Java business objec=
ts, but that's my personal preference.
>
> If you are frequently updating and reading the same data, that's an anti-=
pattern in Cassandra. Reads will be competing for resources with compaction=
s, etc. And of course, unless you use QUORUM or ALL you really shouldn't re=
ly on a read-before-write.
>
>
> Regards,
> Oleg Dulin
> Please note my new office #: 732-917-0159
>
> On Jan 31, 2013, at 9:35 AM, Daniel God=E1s <dgodas@gmail.com> wrote:
>
>> Can I not get away with it using QUORUM? Correct me if I'm wrong but I
>> think the only way I can get a read consistency issue using QUORUM is
>> if 2 nodes fail after a write and before the TTL expires. I can live
>> with the overhead of using QUORUM for the locking operations as they
>> won't be used often.
>>
>> 2013/1/31 Oleg Dulin <oleg.dulin@liquidanalytics.com>:
>>> The problem with using Cassandra for locking is that if you ever have r=
ead consistency issues (which you will unless you use ALL) you will have in=
consistent values.
>>>
>>> In general I would avoid doing a read-before-write with Cassandra. I wo=
uld come up with another way to update my data.
>>>
>>> Regards,
>>> Oleg Dulin
>>> Please note my new office #: 732-917-0159
>>>
>>> On Jan 31, 2013, at 9:19 AM, Daniel God=E1s <dgodas@gmail.com> wrote:
>>>
>>>> Ok, I've done some reading. If I understood it correctly the idea
>>>> would be to send messages to the queue that contain a transaction i.e.
>>>> a list of CQL commands to be run atomically. When one of the consumers
>>>> gets the message it can run the transaction atomically before allowing
>>>> another consumer to get the next message. If this is correct then in
>>>> order to handle cases in which I need to interleave code with the CQL
>>>> statements e.g. to check retrieved values, I need to implement a
>>>> protocol that uses the message queue as a locking mechanism. How is
>>>> this better than using cassandra for locking? (using the algorithm I
>>>> proposed or another one).
>>>>
>>>> 2013/1/31 Oleg Dulin <oleg.dulin@liquidanalytics.com>:
>>>>> This may help:
>>>>>
>>>>> http://activemq.apache.org/how-do-distributed-queues-work.html
>>>>>
>>>>> http://activemq.apache.org/topologies.html
>>>>>
>>>>> http://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connectio=
n.html
>>>>>
>>>>> Although I would use ActiveMQ spring configuration, not write code. B=
ut the point is -- you can have multiple processes participating in an Acti=
veMQ federation; you can configure AMQ's fault tolerance profiles to your l=
iking without having to set up a yet another server with a single point of =
failure.
>>>>>
>>>>> You have a single distributed queue. Each process has a writer consum=
er on that queue. AMQ knows to load balance, only one consumer at a time ge=
ts to write. Instead of writing to cassandra, you send your data item to th=
e queue. The next available consumer gets the message and writes it -- all =
in the order of messages on the queue, and only one consumer writer at a ti=
me.
>>>>>
>>>>> Regards,
>>>>> Oleg Dulin
>>>>> Please note my new office #: 732-917-0159
>>>>>
>>>>> On Jan 31, 2013, at 8:11 AM, Daniel God=E1s <dgodas@gmail.com> wrote:
>>>>>
>>>>>> Sounds good, I'll try it out. Thanks for the help.
>>>>>>
>>>>>> 2013/1/31 Oleg Dulin <oleg.dulin@liquidanalytics.com>:
>>>>>>> Use embedded amq brokers , no need set up any servers  . It literal=
ly is
>>>>>>> one line of code to turn it on, and 5 lines of code to implement wh=
at you
>>>>>>> want.
>>>>>>>
>>>>>>> We have a cluster of servers writing to Cassandra this way and we a=
re not
>>>>>>> using any j2ee containers.
>>>>>>>
>>>>>>> On Thursday, January 31, 2013, Daniel God=E1s wrote:
>>>>>>>
>>>>>>>> Doesn't that require you to set up a server for the message queue =
and
>>>>>>>> know it's address? That sort of defeats the purpose of having a
>>>>>>>> database like cassandra in which all nodes are equal and there's n=
o
>>>>>>>> single point of failure.
>>>>>>>>
>>>>>>>> 2013/1/31 Oleg Dulin <oleg.dulin@liquidanalytics.com <javascript:;=
>>:
>>>>>>>>> Use a JMS message queue to send objects you want to write. Your w=
riter
>>>>>>>> process then will listen on this queue and write to Cassandra. Thi=
s ensures
>>>>>>>> that all writes happen in an orderly fashion, one batch at a time.
>>>>>>>>>
>>>>>>>>> I suggest ActiveMQ. It is easy to set up. This is what we use for=
 this
>>>>>>>> type of a use case. No need to overcomplicate this with Cassandra.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Oleg Dulin
>>>>>>>>> Please note my new office #: 732-917-0159
>>>>>>>>>
>>>>>>>>> On Jan 31, 2013, at 6:35 AM, Daniel God=E1s <dgodas@gmail.com<jav=
ascript:;>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I need a locking mechanism on top of cassandra so that multiple
>>>>>>>>>> clients can protect a critical section. I've seen some attempts,
>>>>>>>>>> including Dominic Williams' wait chain algorithm but I think it =
can be
>>>>>>>>>> simplified. This is the procedure I wrote to implement a simple =
mutex.
>>>>>>>>>> Note that it hasn't been thoroughly tested and I have been using
>>>>>>>>>> cassandra for a very short time so I'd appreciate any comments o=
n
>>>>>>>>>> obvious errors or things I'm doing plain wrong and will never wo=
rk.
>>>>>>>>>>
>>>>>>>>>> The assumptions and requirements for the algorithm are the same =
as
>>>>>>>>>> Dominic Williams'
>>>>>>>>>> (
>>>>>>>> http://media.fightmymonster.com/Shared/docs/Wait%20Chain%20Algorit=
hm.pdf).
>>>>>>>>>>
>>>>>>>>>> We will create a column family for the locks referred to as "loc=
ks"
>>>>>>>>>> throughout this procedure. The column family contains two column=
s; an
>>>>>>>>>> identifier for the lock  which will also be the column key ("id"=
) and
>>>>>>>>>> a counter ("c"). Throughout the procedure "my_lock_id" will be u=
sed as
>>>>>>>>>> the lock identifier. An arbitrary time-to-live value is required=
 by
>>>>>>>>>> the algorithm. This value will be referred to as "t". Choosing a=
n
>>>>>>>>>> appropriate value for "t" will be postponed until the algorithm =
is
>>>>>>>>>> deemed good.
>>>>>>>>>>
>>>>>>>>>> =3D=3D=3D begin procedure =3D=3D=3D
>>>>>>>>>>
>>>>>>>>>> (A) When a client needs to access the critical section the follo=
wing
>>>>>>>>>> steps are taken:
>>>>>>>>>>
>>>>>>>>>> --- begin ---
>>>>>>>>>>
>>>>>>>>>> 1) SELECT c FROM locks WHERE id =3D my_lock_id
>>>>>>>>>> 2) if c =3D 0 try to acquire the lock (B), else don't try (C)
>>>>>>>>>>
>>>>>>>>>> --- end ---
>>>>>>>>>>
>>>>>>>>>> (B) Try to acquire the lock:
>>>>>>>>>>
>>>>>>>>>> --- begin ---
>>>>>>>>>>
>>>>>>>>>> 1) UPDATE locks USING TTL t SET c =3D c + 1 WHERE id =3D my_lock=
_id
>>>>>>>>>> 2) SELECT c FROM locks WHERE id =3D my_lock_id
>>>>>>>>>> 3) if c =3D 1 we acquired the lock (D), else we didn't (C)
>>>>>>>>>>
>>>>>>>>>> --- end ---
>>>>>>>>>>
>>>>>>>>>> (C) Wait before re-trying:
>>>>>>>>>>
>>>>>>>>>> --- begin ---
>>>>>>>>>>
>>>>>>>>>> 1) sleep for a random time higher than t and start at (A) again
>>>>>>>>>>
>>>>>>>>>> --- end ---
>>>>>>>>>>
>>>>>>>>>> (D) Execute the critical section and release the lock:
>>>>>>>>>>
>>>>>>>>>> --- begin ---
>>>>>>>>>>
>>>>>>>>>> 1) start background thread that increments c with TTL =3D t ever=
y t / 2
>>>>>>>>>> interval (UPDATE locks USING TTL t SET c =3D c + 1 WHERE id =3D
>>>>>>>>>> my_lock_id)
>>>>>>>>>> 2) execute the critical section
>>>>>>>>>> 3) kill background thread
>>>>>>>>>> 4) DELETE * FROM locks WHERE id =3D my_lock_id
>>>>>>>>>>
>>>>>>>>>> --- end ---
>>>>>>>>>>
>>>>>>>>>> =3D=3D=3D end procedure =3D=3D=3D
>>>>>>>>>>
>>>>>>>>>> Looking forward to read your comments,
>>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from Gmail Mobile
>>>>>
>>>
>