zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Getting confused with the "recipe for lock"
Date Mon, 14 Jan 2013 16:05:14 GMT
Yes.

And in general, you can't have precise distributed lock control.  There
will always be a bit of slop.

So decide which penalty is easier to pay.  Do you want "at-most-one" or
"at-least-one" or something in between?  You can't have "exactly-one" and
still deal with expected problems like partition or node failure.


On Mon, Jan 14, 2013 at 7:38 AM, Vitalii Tymchyshyn <tivv00@gmail.com>wrote:

> There are two events: disconnected and session expired. The ephemeral nodes
> are removed after the second one. The client  receives both. So to
> implement "at most one lock holder" scheme, client owning lock must think
> it've lost lock ownership since it've received disconnected event. So,
> there is period of time between disconnect and session expired when noone
> should have the lock. It's "safety" time to accomodate for time shifts,
> network latencies, lock ownership recheck interval (in case when client
> can't stop using resource immediatelly and simply checks regulary if it
> still holds the lock).
>
>
>
> 2013/1/14 Hulunbier <hulunbier@gmail.com>
>
> > Hi Vitalii,
> >
> > > I don't see why clock must be in sync.
> >
> > I don't see any reason to precisely sync the clocks either (but if we
> > could ... that would be wonderful.).
> >
> > By *some constrains of clock drift*, I mean :
> >
> > "Every node has a clock, and all clocks increase at the same rate"
> > or
> > "the server’s clock advance no faster than a known constant factor
> > faster than the client’s.".
> >
> >
> > >Also note the difference between disconnected and session
> > > expired events. This time difference is when client knows "something's
> > > wrong", but another client did not get a lock yet.
> >
> > sorry, but I failed to get your idea well; would you please give me
> > some further explanation?
> >
> >
> > On Mon, Jan 14, 2013 at 6:37 PM, Vitalii Tymchyshyn <tivv00@gmail.com>
> > wrote:
> > > I don't see why clock must be in sync. They are counting time periods
> > > (timeouts). Also note the difference between disconnected and session
> > > expired events. This time difference is when client knows "something's
> > > wrong", but another client did not get a lock yet. You will have
> problems
> > > if client can't react (and release resources) between this two events.
> > >
> > > Best regards, Vitalii Tymchyshyn
> > >
> > >
> > > 2013/1/13 Hulunbier <hulunbier@gmail.com>
> > >
> > >> Thanks Jordan,
> > >>
> > >> > Assuming the clocks are in sync between all participants…
> > >>
> > >> imho, perfect clock synchronization in a distributed system is very
> > >> hard (if it can be).
> > >>
> > >> > Someone with better understanding of ZK internals can correct me,
> but
> > >> this is my understanding.
> > >>
> > >> I think I might have missed some very important and subtile(or
> > >> obvious?) points of the recipe / ZK protocol.
> > >>
> > >> I just can not believe that, there could be such type of a flaw in the
> > >> lock-recipe,  for so long time,  without anybody has pointed it out.
> > >>
> > >> On Sun, Jan 13, 2013 at 9:31 AM, Jordan Zimmerman
> > >> <jordan@jordanzimmerman.com> wrote:
> > >> > On Jan 12, 2013, at 2:30 AM, Hulunbier <hulunbier@gmail.com>
wrote:
> > >> >
> > >> >> Suppose the network link betweens client1 and server is at very
low
> > >> >> quality (high packet loss rate?) but still fully functional.
> > >> >>
> > >> >> Client1 may be happily sending heart-beat-messages to server
> without
> > >> >> notice anything; but ZK server could be unable to receive
> > >> >> heart-beat-messages from client1 for a long period of time , which
> > >> >> leads ZK server to timeout client1's session, and delete the
> > ephemeral
> > >> >> node
> > >> >
> > >> > I believe the heartbeats go both ways. Thus, if the client doesn't
> > hear
> > >> from the server it will post a Disconnected event.
> > >> >
> > >> >> But I still feels that, no matter how well a ZK application
> behaves,
> > >> >> if we use ephemeral node in the lock-recipe; we can not guarantee
> "at
> > >> >> any snapshot in time no two clients think they hold the same lock",
> > >> >> which is the fundamental requirement/constraint for a lock.
> > >> >
> > >> > Assuming the clocks are in sync between all participants… The server
> > and
> > >> the client that holds the lock should determine that there is a
> > >> disconnection at nearly the same time. I imagine that there is a
> certain
> > >> amount of time (a few milliseconds) overlap here. But, the next client
> > >> wouldn't get the notification immediately anyway. Further, when the
> next
> > >> client gets the notification, it still needs to execute a
> getChildren()
> > >> command, process the results, etc. before it can determine that it has
> > the
> > >> lock. That two clients would think they have the lock at the same time
> > is a
> > >> vanishingly small possibility. Even if it did happen it would only be
> > for a
> > >> few milliseconds at most.
> > >> >
> > >> > Someone with better understanding of ZK internals can correct me,
> but
> > >> this is my understanding.
> > >> >
> > >> > -Jordan
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >  Vitalii Tymchyshyn
> >
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message