curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: InterProcessMutex doesn't detect deletion of lock file
Date Tue, 20 Jan 2015 17:48:24 GMT
In the many years of Curators’ existence no one that I know has had an issue with this. ZooKeeper
is very robust and nodes do not get deleted abnormally like this. You are posing a hypothetical
situation. It’s not reasonable to handle every single edge case. This would be the equivalent
of someone going into the production database and arbitrarily deleting records. The locking
code is already incredibly complicated and I wouldn’t want to burden it with this new behavior
and overhead. However, if you can make it work reasonably please provide a PR and the committers
will look at it.

-Jordan



On January 20, 2015 at 12:38:36 PM, Michael Peterson (quux00@gmail.com) wrote:

> But manually deleting the lock node is not normal behavior.
> It should never happen in production.

I agree that it would be abnormal.  But abnormal doesn't mean impossible.

> Can you explain the scenario in more detail?

There may be a bug in ZK (now or in the future) that in some rare cases deletes a file when
it should not.

Or a team might in the practice of managing their ZK ensemble via the ZK CLI and someone might
accidentally type:
"delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"

rather than

"get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000". 

Or even worse, type
"rmr /XXX/masterlock". 

(I've seen a somewhat similar manual mistake done on HDFS of a production Hadoop system where
months of data was deleted using up-arrow too fast and issuing a -rmr instead of -ls cmd.)

For a system where I need to be absolutely sure that I and only I have the lock, this abnormal
"backdoor" deletion possibility worries me.  To build a truly robust system, you have to
handle all the possibilities you can.

The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced earlier seems to be
arguing the same thing.


On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jordan@jordanzimmerman.com> wrote:
But manually deleting the lock node is not normal behavior. It should never happen in production.
Can you explain the scenario in more detail? 

-JZ


Mime
View raw message