Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F9E5185E4 for ; Sun, 13 Sep 2015 11:29:17 +0000 (UTC) Received: (qmail 49159 invoked by uid 500); 13 Sep 2015 11:29:16 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 49109 invoked by uid 500); 13 Sep 2015 11:29:16 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 49098 invoked by uid 99); 13 Sep 2015 11:29:16 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Sep 2015 11:29:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F11741A185F for ; Sun, 13 Sep 2015 11:29:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ByFVyV5GVbYe for ; Sun, 13 Sep 2015 11:29:04 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 2BD2E20F93 for ; Sun, 13 Sep 2015 11:29:04 +0000 (UTC) Received: from miraculix.home ([178.196.190.51]) by mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id 0M7Hao-1YfOCW1jbq-00x1IS; Sun, 13 Sep 2015 13:28:54 +0200 Content-Type: multipart/alternative; boundary="Apple-Mail=_BE0F85E4-5D11-4652-B23E-3715C65653D3" Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: disconnected events and session expiration From: Simon In-Reply-To: Date: Sun, 13 Sep 2015 13:28:52 +0200 Cc: user@zookeeper.apache.org Message-Id: References: <59E1893C-C885-4FD3-AED1-ED5976F74E08@gmx.ch> To: Jordan Zimmerman X-Mailer: Apple Mail (2.2104) X-Provags-ID: V03:K0:6gQ99jSmKm09ONXmj4TmFbA+pW4gpolE+LX7tycobTgrzX7fbJp HC+coy9j72E9Ajs4N3PToYy77z6QsxApqC39Pmb2OgxPTiD3b9QkIZuhvH3kqfSqC25y98A 6AsgBqJQukYgj/GEpc0QsSeAq6LDeT7F/kWECbUbjIsWlMelRGMa3kkm9NMLqaRkFxaEr6e za+b+xR4Zogdje71He1og== X-UI-Out-Filterresults: notjunk:1;V01:K0:UvCf1xC4zGA=:Z7h1upNzqZsbhysj7Wczed rye1UXGLeLd6hz6YTOxqBI8BZkshMQGySPeQwqJ6XVenSlVD83rTCeztpaKpjre1AOlHWaHet nzVSOLj4PBqG1G6QL47nJlW2HVLeFVHHrVBMIQruzrj+hTB4H0mP2/qHfEQONR9wYALMBi2+R 9awaK9H7I5+TvLBNF3h/XdqVD69fE37l5OjkciplqQoMBvB9RKRfF630lLZnvUku/46RDCMkZ r40FtD2XJqR6ljMe9624q5e+hpNNpchJkyZV+JZwMtKB9566Z9dpd4714Db1/9TQmT+3J71sP /kphVsPMcVq9fCx1FFgXxrhPY1i3aekRlFrbsaWO0wed1BZi1fAMVz1TEycj3cenkCDmgKjMT 1BFEteE7bNwEUUHSAXVl3mNATcECDc+JLKthvN0ICa3Rpf3WPKP5hFvYmvRd2m6tXWdGE0pN1 fxwMOgoHZahn/hTQ/4nN6HR5bR7sFFrBi+MX/s9OLs7l6gQWAG+PbMtR/hG+mC0SguYgYpSgV EpbiW0W42spq+l0r4N1svBsMtQKTnWgZ9x3Ngr1RPyL1Mb1iim9RiIFXWmAUAzWXxCif3MMCL g77cdV264lV26S1qVxH1GOS+CCwgBIIVsxUKx2SkfhQg0TpjJBMt7TEIeeKZo/+u3lMg74X/b HW3a9J6fKICUVb3JA+dvMMNjQX8XlKoZVEasAa9jUgOrrTLVNtJvsmn1ppAz28hHNyYjdn/Hu xtvUT7PxxuVI20lS8ilnE2oT9lZ+t3ihU1GLbg== --Apple-Mail=_BE0F85E4-5D11-4652-B23E-3715C65653D3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 That didn=E2=80=99t really answer any of my questions.=20 If I own a lock, I am entitled to do some work exclusively. No one else = should be doing that work. If I get disconnected or the session times = out I have to stop working. Somebody else will take over the work in a = short time. If I understood the programmers guide correctly, the expired = event will not be delivered to me until I reconnect. Correct? So, I have = to use the disconnected event to initiate a graceful stop. Stopping work = might take some time, e.g. because I am doing a REST service call that = takes up to 20s. Let=E2=80=99s say doing the call twice leads to data = corruption in the backend service (e.g. HTTP POST, which is non = idempotent). So, ideally, if I am still running, I should try my best to = complete normally. If the state of the work units is kept in ZK, I = cannot update the state anyway. If I store it in some other datastore, I = might be able to update the state or not (depending on how the network = has been partitioned). The more I think about it, the harder it seems to get this stuff working = reliably. What if my node crashes? I cannot complete my work normally. = So, whoever takes over my work will try to redo it anyways. Either the = receiver is made idempotent (which is not always possible) or the new = work owner needs to be aware of the aborted task and be extra cautious, = e.g. by checking whether the work unit has completed or not. It seems to = me that making the =E2=80=9Ccrash=E2=80=9D case the default (i.e. = =E2=80=9Ccrash=E2=80=9D the worker thread whenever a disconnected event = is received) is the best solution. Then I am forced to make the crash = case robust. Guess that=E2=80=99s what some people call =E2=80=9Ccrash-onl= y design=E2=80=9D. Simon > On 13 Sep 2015, at 03:19 , Jordan Zimmerman = wrote: >=20 > I used to advise that people treat Disconnected the same as session = loss as it=E2=80=99s safer. But, you can also set a timer when = Disconnected is received and when your session timeout elapses you can = then consider session loss (note, use the negotiated value from the ZK = handle). FYI - version 3.0.0 of Apache Curator will have an option to = choose this alternate method. >=20 > -Jordan >=20 >=20 >=20 > On September 12, 2015 at 4:47:46 PM, Simon (cocoa@gmx.ch = ) wrote: >=20 >> Hi=20 >>=20 >> I am trying to get a better understanding of Zookeeper and how it = should be used. Let=E2=80=99s talk about the lock recipe = (http://zookeeper.apache.org/doc/r3.4.6/recipes.html#sc_recipes_Locks). =20= >>=20 >> - X aquires the lock=20 >> - X does some long running work (longer than the session timeout)=20 >> - X gets partioned away from the quorum while it was doing some work=20= >> - after some time (determined by the timeout passed to ZK) Y will = aquire the lock=20 >>=20 >> In that situation both X and Y are holding the lock (unless X is = acting properly). If I understand the documentation correctly = (http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#ch_zkSes= sions), X would receive a disconnected event in that situation (but not = an expired event unless it successfully reconnects). So, X should stop = doing the work it is doing until it gets reconnected. How much time does = X have to stop the work it is doing? i.e. how long does it take from = disconnected event sent to X to expiration of the ephemeral node used = for the lock? Having two clients inside a critical section protected by = a lock would not be a good idea.=20 >>=20 >> Regards,=20 >> Simon --Apple-Mail=_BE0F85E4-5D11-4652-B23E-3715C65653D3--