zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hulunbier <hulunb...@gmail.com>
Subject Re: Getting confused with the "recipe for lock"
Date Tue, 15 Jan 2013 02:28:48 GMT
Thanks Ted,

> And in general, you can't have precise distributed lock control.  There
> will always be a bit of slop.

Yes, I agree with you.

> So decide which penalty is easier to pay.  Do you want "at-most-one" or
> "at-least-one" or something in between?  You can't have "exactly-one" and
> still deal with expected problems like partition or node failure.

Yes again, I feel the same way.

IMHO, a lock(basic lock, not R/W lock) should be exclusive by nature.

*If* really there was such flaw in the recipe,  imho, they should not
claim "at any snapshot in time no two clients think they hold the same
lock" , at least with some notes; it is ... misleading.


On Tue, Jan 15, 2013 at 12:05 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Yes.
>
> And in general, you can't have precise distributed lock control.  There
> will always be a bit of slop.
>
> So decide which penalty is easier to pay.  Do you want "at-most-one" or
> "at-least-one" or something in between?  You can't have "exactly-one" and
> still deal with expected problems like partition or node failure.
>
>
> On Mon, Jan 14, 2013 at 7:38 AM, Vitalii Tymchyshyn <tivv00@gmail.com>wrote:
>
>> There are two events: disconnected and session expired. The ephemeral nodes
>> are removed after the second one. The client  receives both. So to
>> implement "at most one lock holder" scheme, client owning lock must think
>> it've lost lock ownership since it've received disconnected event. So,
>> there is period of time between disconnect and session expired when noone
>> should have the lock. It's "safety" time to accomodate for time shifts,
>> network latencies, lock ownership recheck interval (in case when client
>> can't stop using resource immediatelly and simply checks regulary if it
>> still holds the lock).
>>
>>
>>
>> 2013/1/14 Hulunbier <hulunbier@gmail.com>
>>
>> > Hi Vitalii,
>> >
>> > > I don't see why clock must be in sync.
>> >
>> > I don't see any reason to precisely sync the clocks either (but if we
>> > could ... that would be wonderful.).
>> >
>> > By *some constrains of clock drift*, I mean :
>> >
>> > "Every node has a clock, and all clocks increase at the same rate"
>> > or
>> > "the server’s clock advance no faster than a known constant factor
>> > faster than the client’s.".
>> >
>> >
>> > >Also note the difference between disconnected and session
>> > > expired events. This time difference is when client knows "something's
>> > > wrong", but another client did not get a lock yet.
>> >
>> > sorry, but I failed to get your idea well; would you please give me
>> > some further explanation?
>> >
>> >
>> > On Mon, Jan 14, 2013 at 6:37 PM, Vitalii Tymchyshyn <tivv00@gmail.com>
>> > wrote:
>> > > I don't see why clock must be in sync. They are counting time periods
>> > > (timeouts). Also note the difference between disconnected and session
>> > > expired events. This time difference is when client knows "something's
>> > > wrong", but another client did not get a lock yet. You will have
>> problems
>> > > if client can't react (and release resources) between this two events.
>> > >
>> > > Best regards, Vitalii Tymchyshyn
>> > >
>> > >
>> > > 2013/1/13 Hulunbier <hulunbier@gmail.com>
>> > >
>> > >> Thanks Jordan,
>> > >>
>> > >> > Assuming the clocks are in sync between all participants…
>> > >>
>> > >> imho, perfect clock synchronization in a distributed system is very
>> > >> hard (if it can be).
>> > >>
>> > >> > Someone with better understanding of ZK internals can correct
me,
>> but
>> > >> this is my understanding.
>> > >>
>> > >> I think I might have missed some very important and subtile(or
>> > >> obvious?) points of the recipe / ZK protocol.
>> > >>
>> > >> I just can not believe that, there could be such type of a flaw in
the
>> > >> lock-recipe,  for so long time,  without anybody has pointed it out.
>> > >>
>> > >> On Sun, Jan 13, 2013 at 9:31 AM, Jordan Zimmerman
>> > >> <jordan@jordanzimmerman.com> wrote:
>> > >> > On Jan 12, 2013, at 2:30 AM, Hulunbier <hulunbier@gmail.com>
wrote:
>> > >> >
>> > >> >> Suppose the network link betweens client1 and server is at
very low
>> > >> >> quality (high packet loss rate?) but still fully functional.
>> > >> >>
>> > >> >> Client1 may be happily sending heart-beat-messages to server
>> without
>> > >> >> notice anything; but ZK server could be unable to receive
>> > >> >> heart-beat-messages from client1 for a long period of time
, which
>> > >> >> leads ZK server to timeout client1's session, and delete the
>> > ephemeral
>> > >> >> node
>> > >> >
>> > >> > I believe the heartbeats go both ways. Thus, if the client doesn't
>> > hear
>> > >> from the server it will post a Disconnected event.
>> > >> >
>> > >> >> But I still feels that, no matter how well a ZK application
>> behaves,
>> > >> >> if we use ephemeral node in the lock-recipe; we can not guarantee
>> "at
>> > >> >> any snapshot in time no two clients think they hold the same
lock",
>> > >> >> which is the fundamental requirement/constraint for a lock.
>> > >> >
>> > >> > Assuming the clocks are in sync between all participants… The
server
>> > and
>> > >> the client that holds the lock should determine that there is a
>> > >> disconnection at nearly the same time. I imagine that there is a
>> certain
>> > >> amount of time (a few milliseconds) overlap here. But, the next client
>> > >> wouldn't get the notification immediately anyway. Further, when the
>> next
>> > >> client gets the notification, it still needs to execute a
>> getChildren()
>> > >> command, process the results, etc. before it can determine that it
has
>> > the
>> > >> lock. That two clients would think they have the lock at the same time
>> > is a
>> > >> vanishingly small possibility. Even if it did happen it would only
be
>> > for a
>> > >> few milliseconds at most.
>> > >> >
>> > >> > Someone with better understanding of ZK internals can correct
me,
>> but
>> > >> this is my understanding.
>> > >> >
>> > >> > -Jordan
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > >  Vitalii Tymchyshyn
>> >
>>
>>
>>
>> --
>> Best regards,
>>  Vitalii Tymchyshyn
>>

Mime
View raw message