From commits-return-12234-archive-asf-public=cust-asf.ponee.io@pulsar.incubator.apache.org Thu Aug 2 09:29:35 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CEEA9180629 for ; Thu, 2 Aug 2018 09:29:34 +0200 (CEST) Received: (qmail 42918 invoked by uid 500); 2 Aug 2018 07:29:33 -0000 Mailing-List: contact commits-help@pulsar.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pulsar.incubator.apache.org Delivered-To: mailing list commits@pulsar.incubator.apache.org Received: (qmail 42909 invoked by uid 99); 2 Aug 2018 07:29:33 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Aug 2018 07:29:33 +0000 From: GitBox To: commits@pulsar.apache.org Subject: [GitHub] massakam opened a new issue #2289: Broker suddenly goes down Message-ID: <153319497317.17715.11267853770091755840.gitbox@gitbox.apache.org> Date: Thu, 02 Aug 2018 07:29:33 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit massakam opened a new issue #2289: Broker suddenly goes down URL: https://github.com/apache/incubator-pulsar/issues/2289 Recently, broker goes down occasionally in our some clusters. The following is an excerpt from log of the broker that went down. ``` 13:30:11.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 25 seconds 13:30:13.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 23 seconds 13:30:15.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 21 seconds 13:30:17.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 19 seconds 13:30:19.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 17 seconds 13:30:21.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 15 seconds 13:30:23.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 13 seconds 13:30:25.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 11 seconds 13:30:27.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds 13:30:29.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds 13:30:31.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 4 seconds 13:30:33.466 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 2 seconds 13:30:35.466 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 0 seconds 13:30:37.466 [pulsar-zk-session-watcher-12-1] ERROR o.a.p.z.ZooKeeperSessionWatcher - timeout expired for reconnecting, invoking shutdown service 13:30:37.467 [pulsar-zk-session-watcher-12-1] INFO org.apache.zookeeper.ZooKeeper - Session: 0x164f333639f0269 closed 13:30:37.467 [pulsar-zk-session-watcher-12-1] INFO o.a.p.b.MessagingServiceShutdownHook - Invoking Runtime.halt(-1) ``` The broker service was shutdown since it could not reconnect to ZK for a long time. However, all ZK servers seemed to be working normally at that time. Does someone know this cause? #### System configuration - **Cluster-A** - **Pulsar version**: 1.22.1-incubating - **ZK version**: 3.4.10 - **Cluster-B** - **Pulsar version**: 2.0.1-incubating - **ZK version**: 3.4.10 ZK settings: ``` tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/var/pulsar-zookeeper clientPort=2181 maxClientCnxns=0 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 server.1=xxxx:2182:2183 server.2=xxxx:2182:2183 server.3=xxxx:2182:2183 server.4=xxxx:2182:2183 server.5=xxxx:2182:2183 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services