Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D042120049E for ; Thu, 10 Aug 2017 19:22:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CEE7E16BB9D; Thu, 10 Aug 2017 17:22:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ED43616BB9A for ; Thu, 10 Aug 2017 19:22:03 +0200 (CEST) Received: (qmail 90336 invoked by uid 500); 10 Aug 2017 17:22:03 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 90324 invoked by uid 99); 10 Aug 2017 17:22:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2017 17:22:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7FB1C1A069C for ; Thu, 10 Aug 2017 17:22:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id W-Zt156tGcro for ; Thu, 10 Aug 2017 17:22:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D95F35FB98 for ; Thu, 10 Aug 2017 17:22:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 69E0CE0732 for ; Thu, 10 Aug 2017 17:22:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 27DD224162 for ; Thu, 10 Aug 2017 17:22:00 +0000 (UTC) Date: Thu, 10 Aug 2017 17:22:00 +0000 (UTC) From: "Michael Han (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-2867) an expired ZK session can be re-established MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 10 Aug 2017 17:22:05 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121935#comment-16121935 ] Michael Han commented on ZOOKEEPER-2867: ---------------------------------------- bq. That Kafka broker went down around Aug 2, 23:16:11. The expectation is that its ZK session will be closed and the registered ephemeral node will be removed. The expectation will be correct if this Kafka broker really died. But if it was a transient interruption, i.e. network issues between that Kafka broker and ZooKeeper, then the session will be renewed and ephemeral nodes appertain to that session will be intact. From the original log posted, the session {{0x25cd1e82c110001}} was renewed (though the timestamp of the log does not match {{Aug 2, 23:16:11}} so probably we are not talking about same event. In any case, do we have any evidence that indicates the broker in question really died? bq. the following entry doesn't necessarily mean that the ephemeral node is actually removed since it could be restored. Correct, I think this log does not imply the final quorum state W.R.T. the ephemeral nodes, because the logging here was done before a consensus regarding the ephemeral removal proposal was reached. When a server thinks a client is dead and initiates the session closing, it will start removing the ephemerals from its data tree while waiting for the votes of the ephemeral removal proposal, so the ephemeral nodes removal here does not have transactional semantic because it does not consider the proposal results. bq. Is there any logging indicating the restore? When log level is set to DEBUG, {{FinalRequestPrecessor}} will log all requests it processes, including the ephemeral nodes recreate (which is just normal ephemeral node create since it has no notion of a state). You can probably search {{type:create}} in logs. The restore will only happen when a server restarts, or rejoin quorum, and finds out it has out dated state with rest of quorum. Otherwise, the ephemeral nodes will either be removed already at quorum level, or not removed at all (on this specific server) :). In both cases (restarts/rejoin quorum), the server will recover the expected state from rest of quorum by replaying transactions logs. > an expired ZK session can be re-established > ------------------------------------------- > > Key: ZOOKEEPER-2867 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2867 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.10 > Reporter: Jun Rao > Attachments: zk.0.formatted, zk.1.formatted > > > Not sure if this is a real bug, but I found an instance when a ZK client seems to be able to renew a session already expired by the ZK server. > From ZK server log, session 25cd1e82c110001 was expired at 22:04:39. > {code:java} > June 27th 2017, 22:04:39.000 INFO org.apache.zookeeper.server.ZooKeeperServer Expiring session 0x25cd1e82c110001, timeout of 12000ms exceeded > June 27th 2017, 22:04:39.001 DEBUG org.apache.zookeeper.server.quorum.Leader Proposing:: sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 txntype:-11 reqpath:n/a > June 27th 2017, 22:04:39.001 INFO org.apache.zookeeper.server.PrepRequestProcessor Processed session termination for sessionid: 0x25cd1e82c110001 > June 27th 2017, 22:04:39.001 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor Processing request:: sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 txntype:-11 reqpath:n/a > June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.server.quorum.Learner Revalidating client: 0x25cd1e82c110001 > June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.server.ZooKeeperServer Client attempting to renew session 0x25cd1e82c110001 at /100.96.5.6:47618 > June 27th 2017, 22:05:20.325 INFO org.apache.zookeeper.server.ZooKeeperServer Established session 0x25cd1e82c110001 with negotiated timeout 12000 for client /100.96.5.6:47618 > {code} > From ZK client's log, it was able to renew the expired session on 22:05:20. > {code:java} > June 27th 2017, 22:05:18.590 INFO org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001, closing socket connection and attempting reconnect 0 > June 27th 2017, 22:05:18.590 WARN org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001 0 > June 27th 2017, 22:05:19.325 WARN org.apache.zookeeper.ClientCnxn SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/opt/confluent/etc/kafka/server_jaas.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. 0 > June 27th 2017, 22:05:19.326 INFO org.apache.zookeeper.ClientCnxn Opening socket connection to server 100.65.188.168/100.65.188.168:2181 0 > June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.ClientCnxn Socket connection established to 100.65.188.168/100.65.188.168:2181, initiating session 0 > June 27th 2017, 22:05:20.327 INFO org.apache.zookeeper.ClientCnxn Session establishment complete on server 100.65.188.168/100.65.188.168:2181, sessionid = 0x25cd1e82c110001, negotiated timeout = 12000 0 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)