Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4424C18ABF for ; Wed, 15 Jul 2015 20:55:41 +0000 (UTC) Received: (qmail 49425 invoked by uid 500); 15 Jul 2015 20:55:40 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 49351 invoked by uid 500); 15 Jul 2015 20:55:39 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 49115 invoked by uid 99); 15 Jul 2015 20:55:39 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 20:55:39 +0000 Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 563331A05B0 for ; Wed, 15 Jul 2015 20:55:39 +0000 (UTC) Received: by wgxm20 with SMTP id m20so42909796wgx.3 for ; Wed, 15 Jul 2015 13:55:38 -0700 (PDT) X-Gm-Message-State: ALoCoQms71eEfE6J29KQC+Y0VZrdOJePMEwyFPmGaf+/vU1vgGKxlqO4AEHkaBLPr6XZqNmkWY6k X-Received: by 10.194.192.33 with SMTP id hd1mr12468755wjc.96.1436993738130; Wed, 15 Jul 2015 13:55:38 -0700 (PDT) MIME-Version: 1.0 References: <1436982861611-7581277.post@n2.nabble.com> <1436984221201-7581279.post@n2.nabble.com> <1436986588198-7581284.post@n2.nabble.com> <1436987312991-7581287.post@n2.nabble.com> <1436987748561-7581293.post@n2.nabble.com> In-Reply-To: From: Ivan Kelly Date: Wed, 15 Jul 2015 20:55:28 +0000 Message-ID: Subject: Re: locking/leader election and dealing with session loss To: Jordan Zimmerman , Alexander Shraer , user@zookeeper.apache.org Cc: "zookeeper-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b5da8b54e6253051af02d0a --047d7b5da8b54e6253051af02d0a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > "at any snapshot in time no two clients think they hold the same lock=E2= =80=9D According to the ZK service. But communication between the service and client takes time. -Ivan On Wed, Jul 15, 2015 at 10:54 PM Ivan Kelly wrote: > Jordan, imagine you have a node which is leader using the hbase example. = A > client makes some request to the leader, which processes the request, lin= es > up a write to the state in hbase, and promptly goes into a 30 second gc > pause just before it flushes the socket. During the 30 second pause anoth= er > node takes over as leader and starts writing to the state. Now, when the > pause ends, what will stop the write from the first leader being flushed = to > the socket and then hitting hbase? > > -Ivan > > On Wed, Jul 15, 2015 at 10:26 PM Jordan Zimmerman < > jordan@jordanzimmerman.com> wrote: > >> I think we may be talking past each other here. My contention (and the Z= K >> docs agree BTW) is that, properly written and configured, "at any >> snapshot in time no two clients think they hold the same lock=E2=80=9D. = How your >> application acts on that fact is another thing. You might need sequence >> numbers, you might not. >> >> -Jordan >> >> >> On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shralex@gmail.com) >> wrote: >> >> Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper: >> link >> < >> http://static.googleusercontent.com/media/research.google.com/en//archiv= e/chubby-osdi06.pdf> >> >> >> it suggests 2 ways in which the storage can support lock generations and >> proposes an alternative for the case where the storage can't be made >> aware >> of lock generations. >> >> On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman < >> jordan@jordanzimmerman.com> wrote: >> >> > Ivan, I just read the blog and I still don=E2=80=99t see how this can = happen. >> > Sorry if I=E2=80=99m being dense. I=E2=80=99d appreciate a discussion = on this. In your >> blog >> > you state: "when ZooKeeper tells you that you are leader, there=E2=80= =99s no >> > guarantee that there isn=E2=80=99t another node that 'thinks' its the = leader.=E2=80=9D >> > However, given a long enough session time =E2=80=94 I usually recommen= d 30=E2=80=9360 >> > seconds, I don=E2=80=99t see how this can happen. The client itself de= termines >> that >> > there is a network partition when there is no heartbeat success. The >> > heartbeat is a fraction of the session timeout. Once the heartbeat >> fails, >> > the client must assume it no longer has the lock. Another client canno= t >> > take over the lock until, at minimum, session timeout. So, how then ca= n >> > there be two leaders? >> > >> > -Jordan >> > >> > On July 15, 2015 at 2:23:12 PM, Ivan Kelly (ivank@apache.org) wrote: >> > >> > I blogged about this exact problem a couple of weeks ago [1]. I give a= n >> > example of how split brain can happen in a resource under a zk lock >> (Hbase >> > in this case). As Camille says, sequence numbers ftw. I'll add that th= e >> > data store has to support them though, which not all do (in fact I've >> yet >> > to see one in the wild that does). I've implemented a prototype that >> works >> > with hbase[2] if you want to see what it looks like. >> > >> > -Ivan >> > >> > [1] >> > >> > >> https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-7310= 24295215 >> > [2] https://github.com/ivankelly/hbase-exclusive-writer >> > >> > On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta >> wrote: >> > >> > > Jordan, I mean the client gives up the lock and stops working on the >> > shared >> > > resource. So when zookeeper is unavailable, no one is working on any >> > shared >> > > resource (because they cannot distinguish network partition from >> > zookeeper >> > > DEAD scenario). >> > > >> > > >> > > >> > > -- >> > > View this message in context: >> > > >> > >> http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-d= ealing-with-session-loss-tp7581277p7581293.html >> > > Sent from the zookeeper-user mailing list archive at Nabble.com. >> > > >> > >> >> --047d7b5da8b54e6253051af02d0a--