Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F23C518C79 for ; Thu, 16 Jul 2015 13:03:06 +0000 (UTC) Received: (qmail 66911 invoked by uid 500); 16 Jul 2015 13:03:06 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 66861 invoked by uid 500); 16 Jul 2015 13:03:06 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 66849 invoked by uid 99); 16 Jul 2015 13:03:05 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 13:03:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 45DA7C0098 for ; Thu, 16 Jul 2015 13:03:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.981 X-Spam-Level: *** X-Spam-Status: No, score=3.981 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id vdwT1csd5Qho for ; Thu, 16 Jul 2015 13:02:57 +0000 (UTC) Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 6792842AA9 for ; Thu, 16 Jul 2015 13:02:57 +0000 (UTC) Received: by igcqs7 with SMTP id qs7so12933154igc.0 for ; Thu, 16 Jul 2015 06:02:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version:content-type; bh=mEbelXESZ7M+3X5mEM06Uil8o3J117DhCFzzLYFM+o0=; b=Kv1cp7yTQ6vrqJEQlkv5n2mmif74JXG//nekjm0oGqNL5uhyKIMndbLXEy4OSaQyf8 WHZIaq6HF8/GospAmmg9yL8QG6HJeB2ti5aUJD7bB23HQLuiy05CrT9qnxVt/1rlBzuD B0s+BELON6cd9CSlv4Aw2dFGjaIqs0w/LNKxVhijckQ22Zk/Uv/Y4rKIO1ihmY/UMig7 TopvGWPe6KjcOfv0VaXQB0WwktC4dtzij0bG2+ndcUYsC1dUCibbfucy8Fi+VLGR++tg G5qDVGNKb7M5AOTWm4iMuvUtrAlLL+c+Hy2jGp3+0L4TrJZnOkigX6WcFFnpUo+pW30t bOOw== X-Gm-Message-State: ALoCoQmQ/jb1Ovmsx4rKQBr1uKJwvKIzVhuBCl71iisHcMeTFOjmJLp/v5JnKiU9kEAvi87bJo8j X-Received: by 10.50.136.134 with SMTP id qa6mr3181764igb.26.1437051732003; Thu, 16 Jul 2015 06:02:12 -0700 (PDT) Received: from Jordans-MacBook-Pro.local ([190.140.97.103]) by smtp.gmail.com with ESMTPSA id d4sm1287740igl.1.2015.07.16.06.02.08 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 16 Jul 2015 06:02:09 -0700 (PDT) Date: Thu, 16 Jul 2015 08:02:06 -0500 From: Jordan Zimmerman To: Ivan Kelly , user@zookeeper.apache.org Cc: "=?utf-8?Q?zookeeper-user=40hadoop.apache.org?=" Message-ID: In-Reply-To: References: <1436982861611-7581277.post@n2.nabble.com> <1436984221201-7581279.post@n2.nabble.com> <1436986588198-7581284.post@n2.nabble.com> <1436987312991-7581287.post@n2.nabble.com> <1436987748561-7581293.post@n2.nabble.com> Subject: Re: locking/leader election and dealing with session loss X-Mailer: Airmail (303) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="55a7ab4e_18106528_15a" --55a7ab4e_18106528_15a Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Of course there are a myriad theoretical possibilities. But I don=E2=80=99= t believe any of what you=E2=80=99ve mentioned will happen in production.= =46or any reasonable case, you can be guaranteed that no two processes w= ill consider themselves lock holders at the same instant in time. -Jordan On July 16, 2015 at 7:58:06 AM, Ivan Kelly (ivank=40apache.org) wrote: On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman =20 wrote: =20 > Are you really seeing 30s gc pauses in production=3F If so, then of cou= rse =20 > this could happen. However, if your application can tolerate a 30s paus= e =20 > (which is hard to believe) then your session timeout is too low. The po= int =20 > of the session timeout is to have enough coverage. So, if your app has = 30 =20 > seconds allowable pauses your session timeout would have to be much lon= ger. =20 > =20 GC is just an example. There's other ways the same scenario could happen.= =20 The machine could swap out the process due to load. Someone could do =20 something stupid in the zookeeper event thread and the session expired =20 event is delayed. The state update could have hit the ip stack during =20 network partition, and the process then got wedged. The state update pack= et =20 could have hit the network and been routed via the moon. The clock could = =20 break. =20 If you are relying on a timer on the zk client to maintain a guarantee, =20 then you really aren't giving any guarantee because the zk client doesn't= =20 have control over all the things that could go wrong. =20 -Ivan =20 --55a7ab4e_18106528_15a--