cassandra-client-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Godás <dgo...@gmail.com>
Subject Re: Implementing a locking mechanism - RFC
Date Fri, 01 Feb 2013 23:22:02 GMT
Alexis,

I did some more research and I will go with ZooKeeper for locking and
cassandra for storage. This way I can use the locks in zookeeper to
delimit transactions in cassandra. When the distributed locking
problem is solved I don't really need a distributed queue but it comes
for free with zookeeper too.

Regards,
Dan

2013/2/1 Alexis Wilke <alexis@m2osw.com>:
> Hi Daniel,
>
> Note that I did implement the Lamport bakery algorithm using Cassandra only
> in the libQtCassandra (C++) that I'm working on. That's a lock which uses
> QUORUM. So as mentioned earlier, you may get in trouble with reads if you
> have problems with some nodes (but from what I understand, if any one write
> uses QUORUM you run in the same problem anyway.)
>
> Frankly, in my website environment, not having what I would call a *standard
> lock* is a real annoying problem! I currently need it in only two
> circumstances: a user registers an account (which must be referenced by a
> unique email address), and a user creates a "new page" (each page must be
> unique on a per website basis, also in our environment a "page" means any
> type of content but in most cases each one of those will be unique anyway.)
>
> A queue is not so good for us because 99.9% of the time we want a quick
> response and if two users try to register with different email addresses we
> get an "instant lock." I will no need to wait on my locks because I'll be
> locking a unique piece of content. The penalty is about 3 reads, 3 writes
> and 2 drops total (lock+unlock) if you are the only person acquiring the
> lock. Additional reads and sleeps are required if others acquired the lock.
> I tested with 24 processes total on 3 client computers and got the expected
> result every second for 300 seconds (5 min.) running my test against a
> Cassandra cluster made of 3 nodes (24 x 300 = 7,200 successful locks which
> clashed every second.)
>
> You may be interested in checking those out for more details:
>
> http://snapwebsites.org/journal/2013/01/inter-process-inter-computer-lock-cassandra
>
> http://snapwebsites.org/project/libqtcassandra
>
> http://snapwebsites.org/mo_references_view/libQtCassandra-doc-0.4/classQtCassandra_1_1QCassandraLock.html#details
>
>
> ------------
> Alexis Wilke
> CEO
> Made to Order Software Corporation
> http://snapwebsites.org/
>
>
>
>
> Daniel Godás wrote:
>>
>> This sounds reasonable. Thanks for the guidance, I think I just cut
>> down this project's development time by half...
>>
>> 2013/1/31 Oleg Dulin<oleg.dulin@liquidanalytics.com>:
>>
>>>
>>> QUORUM means that (RF/2+1) nodes must respond. See this calculator:
>>>
>>> http://www.ecyrd.com/cassandracalculator/
>>>
>>> If you use RF=2 and you have 4 nodes you cannot survive outage of any
>>> node if you use QUORUM.
>>>
>>> So you are introducing a point of failure.
>>>
>>> If you need to do reads before writes I strongly suggest a caching
>>> mechanism, and if you want to do locking I suggest queues. I wouldn't send
>>> CQL statements on queues though, but that works. I would send Java business
>>> objects, but that's my personal preference.
>>>
>>> If you are frequently updating and reading the same data, that's an
>>> anti-pattern in Cassandra. Reads will be competing for resources with
>>> compactions, etc. And of course, unless you use QUORUM or ALL you really
>>> shouldn't rely on a read-before-write.
>>>
>>>
>>> Regards,
>>> Oleg Dulin
>>> Please note my new office #: 732-917-0159
>>>
>>> On Jan 31, 2013, at 9:35 AM, Daniel Godás<dgodas@gmail.com>  wrote:
>>>
>>>
>>>>
>>>> Can I not get away with it using QUORUM? Correct me if I'm wrong but I
>>>> think the only way I can get a read consistency issue using QUORUM is
>>>> if 2 nodes fail after a write and before the TTL expires. I can live
>>>> with the overhead of using QUORUM for the locking operations as they
>>>> won't be used often.
>>>>
>>>> 2013/1/31 Oleg Dulin<oleg.dulin@liquidanalytics.com>:
>>>>
>>>>>
>>>>> The problem with using Cassandra for locking is that if you ever have
>>>>> read consistency issues (which you will unless you use ALL) you will
have
>>>>> inconsistent values.
>>>>>
>>>>> In general I would avoid doing a read-before-write with Cassandra. I
>>>>> would come up with another way to update my data.
>>>>>
>>>>> Regards,
>>>>> Oleg Dulin
>>>>> Please note my new office #: 732-917-0159
>>>>>
>>>>> On Jan 31, 2013, at 9:19 AM, Daniel Godás<dgodas@gmail.com>  wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Ok, I've done some reading. If I understood it correctly the idea
>>>>>> would be to send messages to the queue that contain a transaction
i.e.
>>>>>> a list of CQL commands to be run atomically. When one of the consumers
>>>>>> gets the message it can run the transaction atomically before allowing
>>>>>> another consumer to get the next message. If this is correct then
in
>>>>>> order to handle cases in which I need to interleave code with the
CQL
>>>>>> statements e.g. to check retrieved values, I need to implement a
>>>>>> protocol that uses the message queue as a locking mechanism. How
is
>>>>>> this better than using cassandra for locking? (using the algorithm
I
>>>>>> proposed or another one).
>>>>>>
>>>>>> 2013/1/31 Oleg Dulin<oleg.dulin@liquidanalytics.com>:
>>>>>>
>>>>>>>
>>>>>>> This may help:
>>>>>>>
>>>>>>> http://activemq.apache.org/how-do-distributed-queues-work.html
>>>>>>>
>>>>>>> http://activemq.apache.org/topologies.html
>>>>>>>
>>>>>>>
>>>>>>> http://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connection.html
>>>>>>>
>>>>>>> Although I would use ActiveMQ spring configuration, not write
code.
>>>>>>> But the point is -- you can have multiple processes participating
in an
>>>>>>> ActiveMQ federation; you can configure AMQ's fault tolerance
profiles to
>>>>>>> your liking without having to set up a yet another server with
a single
>>>>>>> point of failure.
>>>>>>>
>>>>>>> You have a single distributed queue. Each process has a writer
>>>>>>> consumer on that queue. AMQ knows to load balance, only one consumer
at a
>>>>>>> time gets to write. Instead of writing to cassandra, you send
your data item
>>>>>>> to the queue. The next available consumer gets the message and
writes it --
>>>>>>> all in the order of messages on the queue, and only one consumer
writer at a
>>>>>>> time.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Oleg Dulin
>>>>>>> Please note my new office #: 732-917-0159
>>>>>>>
>>>>>>> On Jan 31, 2013, at 8:11 AM, Daniel Godás<dgodas@gmail.com>
 wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Sounds good, I'll try it out. Thanks for the help.
>>>>>>>>
>>>>>>>> 2013/1/31 Oleg Dulin<oleg.dulin@liquidanalytics.com>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Use embedded amq brokers , no need set up any servers
 . It
>>>>>>>>> literally is
>>>>>>>>> one line of code to turn it on, and 5 lines of code to
implement
>>>>>>>>> what you
>>>>>>>>> want.
>>>>>>>>>
>>>>>>>>> We have a cluster of servers writing to Cassandra this
way and we
>>>>>>>>> are not
>>>>>>>>> using any j2ee containers.
>>>>>>>>>
>>>>>>>>> On Thursday, January 31, 2013, Daniel Godás wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Doesn't that require you to set up a server for the
message queue
>>>>>>>>>> and
>>>>>>>>>> know it's address? That sort of defeats the purpose
of having a
>>>>>>>>>> database like cassandra in which all nodes are equal
and there's
>>>>>>>>>> no
>>>>>>>>>> single point of failure.
>>>>>>>>>>
>>>>>>>>>> 2013/1/31 Oleg
>>>>>>>>>> Dulin<oleg.dulin@liquidanalytics.com<javascript:;>>:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Use a JMS message queue to send objects you want
to write. Your
>>>>>>>>>>> writer
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> process then will listen on this queue and write
to Cassandra.
>>>>>>>>>> This ensures
>>>>>>>>>> that all writes happen in an orderly fashion, one
batch at a time.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I suggest ActiveMQ. It is easy to set up. This
is what we use for
>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> type of a use case. No need to overcomplicate this
with Cassandra.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Oleg Dulin
>>>>>>>>>>> Please note my new office #: 732-917-0159
>>>>>>>>>>>
>>>>>>>>>>> On Jan 31, 2013, at 6:35 AM, Daniel
>>>>>>>>>>> Godás<dgodas@gmail.com<javascript:;>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I need a locking mechanism on top of cassandra
so that multiple
>>>>>>>>>>>> clients can protect a critical section. I've
seen some attempts,
>>>>>>>>>>>> including Dominic Williams' wait chain algorithm
but I think it
>>>>>>>>>>>> can be
>>>>>>>>>>>> simplified. This is the procedure I wrote
to implement a simple
>>>>>>>>>>>> mutex.
>>>>>>>>>>>> Note that it hasn't been thoroughly tested
and I have been using
>>>>>>>>>>>> cassandra for a very short time so I'd appreciate
any comments
>>>>>>>>>>>> on
>>>>>>>>>>>> obvious errors or things I'm doing plain
wrong and will never
>>>>>>>>>>>> work.
>>>>>>>>>>>>
>>>>>>>>>>>> The assumptions and requirements for the
algorithm are the same
>>>>>>>>>>>> as
>>>>>>>>>>>> Dominic Williams'
>>>>>>>>>>>> (
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://media.fightmymonster.com/Shared/docs/Wait%20Chain%20Algorithm.pdf).
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We will create a column family for the locks
referred to as
>>>>>>>>>>>> "locks"
>>>>>>>>>>>> throughout this procedure. The column family
contains two
>>>>>>>>>>>> columns; an
>>>>>>>>>>>> identifier for the lock  which will also
be the column key
>>>>>>>>>>>> ("id") and
>>>>>>>>>>>> a counter ("c"). Throughout the procedure
"my_lock_id" will be
>>>>>>>>>>>> used as
>>>>>>>>>>>> the lock identifier. An arbitrary time-to-live
value is required
>>>>>>>>>>>> by
>>>>>>>>>>>> the algorithm. This value will be referred
to as "t". Choosing
>>>>>>>>>>>> an
>>>>>>>>>>>> appropriate value for "t" will be postponed
until the algorithm
>>>>>>>>>>>> is
>>>>>>>>>>>> deemed good.
>>>>>>>>>>>>
>>>>>>>>>>>> === begin procedure ===
>>>>>>>>>>>>
>>>>>>>>>>>> (A) When a client needs to access the critical
section the
>>>>>>>>>>>> following
>>>>>>>>>>>> steps are taken:
>>>>>>>>>>>>
>>>>>>>>>>>> --- begin ---
>>>>>>>>>>>>
>>>>>>>>>>>> 1) SELECT c FROM locks WHERE id = my_lock_id
>>>>>>>>>>>> 2) if c = 0 try to acquire the lock (B),
else don't try (C)
>>>>>>>>>>>>
>>>>>>>>>>>> --- end ---
>>>>>>>>>>>>
>>>>>>>>>>>> (B) Try to acquire the lock:
>>>>>>>>>>>>
>>>>>>>>>>>> --- begin ---
>>>>>>>>>>>>
>>>>>>>>>>>> 1) UPDATE locks USING TTL t SET c = c + 1
WHERE id = my_lock_id
>>>>>>>>>>>> 2) SELECT c FROM locks WHERE id = my_lock_id
>>>>>>>>>>>> 3) if c = 1 we acquired the lock (D), else
we didn't (C)
>>>>>>>>>>>>
>>>>>>>>>>>> --- end ---
>>>>>>>>>>>>
>>>>>>>>>>>> (C) Wait before re-trying:
>>>>>>>>>>>>
>>>>>>>>>>>> --- begin ---
>>>>>>>>>>>>
>>>>>>>>>>>> 1) sleep for a random time higher than t
and start at (A) again
>>>>>>>>>>>>
>>>>>>>>>>>> --- end ---
>>>>>>>>>>>>
>>>>>>>>>>>> (D) Execute the critical section and release
the lock:
>>>>>>>>>>>>
>>>>>>>>>>>> --- begin ---
>>>>>>>>>>>>
>>>>>>>>>>>> 1) start background thread that increments
c with TTL = t every
>>>>>>>>>>>> t / 2
>>>>>>>>>>>> interval (UPDATE locks USING TTL t SET c
= c + 1 WHERE id =
>>>>>>>>>>>> my_lock_id)
>>>>>>>>>>>> 2) execute the critical section
>>>>>>>>>>>> 3) kill background thread
>>>>>>>>>>>> 4) DELETE * FROM locks WHERE id = my_lock_id
>>>>>>>>>>>>
>>>>>>>>>>>> --- end ---
>>>>>>>>>>>>
>>>>>>>>>>>> === end procedure ===
>>>>>>>>>>>>
>>>>>>>>>>>> Looking forward to read your comments,
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from Gmail Mobile
>>>>>>>>>

Mime
View raw message