curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Peterson <quu...@gmail.com>
Subject Re: InterProcessMutex doesn't detect deletion of lock file
Date Tue, 20 Jan 2015 16:15:28 GMT
Thanks. That bug is for LeaderLatch. Should I open another bug on
InterProcessMutex?  Or just add commentary to the CURATOR-171 issue?

Can anyone address my workaround option (Idea #3 above) - namely
implementing my own custom LockInternalsDriver and setting my own WATCH on
the lock file.  Any ideas if that will hit problems?

On Tue, Jan 20, 2015 at 10:46 AM, John Vines <vines@apache.org> wrote:

> Sounds similar to https://issues.apache.org/jira/browse/CURATOR-171
>
> On Tue, Jan 20, 2015 at 10:23 AM, Michael Peterson <quux00@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am fairly new to Curator and ZK, so apologies if this is has been asked
>> before.  I haven't found anything yet that addresses it.
>>
>> My ZK use case is very simple - HA failover.  Two processes get launched
>> - one does the work and the other waits to take over in case the other dies
>> or otherwise stops working.
>>
>> The Curator InterProcessMutex fits the bill.  However, without too much
>> effort I've found a scenario where both Process A and Process B both think
>> they are the owner at the same time and start doing the work, causing data
>> corruption.
>>
>> The scenario is simply to delete the lock file, which I did via the ZK
>> CLI (zkCli.sh).  The problem is that the InterProcessMutex currently
>> holding the lock doesn't seem to notice that the lock file got deleted, but
>> the InterProcessMutex in the waiting (failover) process *does* notice and
>> creates a new lock and starts doing work.
>>
>> Does the InterProcessMutex set a watch on the lock file it creates?  If
>> not, why not?
>>
>>
>> Idea #1:
>>
>> I tried setting all the Listeners I could figure how to set to detect the
>> NodeDeleted event:
>>
>> - CuratorListener
>> - ConnectionStateListener
>> - UnhandledErrorListener
>>
>> but none get signaled when I manually delete the lock file.
>>
>>
>> Idea #2:
>>
>> Is the solution to set my own watch on the lock file that the IPMutex
>> created?  If so, I see that one way to get the file name of the lock is to
>> call InterProcessMutex#getParticipantNodes().  But the problem is that
>> there can be more than one lock file - it seems
>>
>>     [zk: localhost:2181(CONNECTED) 7] ls /XXX/masterlock
>>     [_c_c1dc399d-b6e4-4051-bd5c-2e300e62bc58-lock-0000000003,
>> _c_bf5de8b2-ed33-4f89-a737-4061f2072c3f-lock-0000000000]
>>
>>     [zk: localhost:2181(CONNECTED) 37] ls /XXX/masterlock
>>     [_c_63490235-7ab6-461d-bab2-401d4439db4f-lock-0000000018, \
>>      _c_1e57c64e-b990-4f9a-96f9-fccf56c0421e-lock-0000000012, \
>>      _c_f09ee1e5-0e47-47a7-961e-d7745ffbfc28-lock-0000000017, \
>>      _c_2f9ebe06-b91c-4886-b916-34ff1fa83541-lock-0000000016]
>>
>> And it seems that I can't use the one with the smallest sequential lock
>> number, because the smallest one might be hanging around from a crashed
>> lockholder and it has expired yet - that is the case in the above example:
>> lock-00000012 is just waiting to be expired after a crash.
>>
>> So I don't know how to tell which lock is "mine" to set a watch on using
>> that method.
>>
>>
>>
>> Idea #3:
>>
>> I see that the InterProcessMutex also takes an optional
>> `LockInternalsDriver` argument.  I looked into that code and there I see
>> that it has access to the lock file name.  In addition, in the
>> `getsTheLock` method it creates a PredicateResults object with a
>> `pathToWatch` arg, which sounds promising, but in the default impl with my
>> setup that pathToWatch is null.
>>
>> So I then created my own CustomLockInternalsDriver and put the lock-file
>> name in pathToWatch (not sure that would work), but when I set
>> `pathToWatch` to the actual lock path, still nothing happens when I delete
>> the file.
>>
>> So then I recorded the path to my lock in the CustomLockInternalsDriver
>> so I could get it in my mainline code and set a WATCH manually/myself.
>> That ends up working.  But that's a lot of work and it's not at all clear
>> what the right solution is and whether it is dangerous to fiddle with
>> creating my own LockInternalsDriver impl.
>>
>> What is the right way to solve this issue?
>>
>>
>> --- How to REPRODUCE ---
>>
>> Here's a link to a gist with my test code:
>> https://gist.github.com/quux00/f6be8fe223a7832ef514
>> Also a gist to my CustomLockInternalsDriver:
>> https://gist.github.com/quux00/ab37cedc46cb5368c853
>>
>> Start up two instances of that code. One will indicate it is "working"
>> and the other "waiting". I then use zkCli.sh to delete the file:
>>
>>     $ ./zkCli.sh
>>     [zk: localhost:2181(CONNECTED) 111] ls /XXX/masterlock
>>     [_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006]
>>     [zk: localhost:2181(CONNECTED) 112] delete
>> /XXX/masterlock/_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006
>>     [zk: localhost:2181(CONNECTED) 113] ls /XXX/masterlock
>>     []
>>
>> The "waiting" process will now create a new lock file and now both
>> processes are "working".
>>
>> Thank you,
>> Michael
>>
>>
>

Mime
View raw message