From commits-return-32361-archive-asf-public=cust-asf.ponee.io@pulsar.apache.org Fri Jun 28 10:16:59 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id ED76818062B for ; Fri, 28 Jun 2019 12:16:58 +0200 (CEST) Received: (qmail 84677 invoked by uid 500); 28 Jun 2019 10:16:58 -0000 Mailing-List: contact commits-help@pulsar.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pulsar.apache.org Delivered-To: mailing list commits@pulsar.apache.org Received: (qmail 84667 invoked by uid 99); 28 Jun 2019 10:16:58 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2019 10:16:58 +0000 From: GitBox To: commits@pulsar.apache.org Subject: [GitHub] [pulsar] massakam opened a new issue #4635: Bookie down causes deadlock in broker Message-ID: <156171701825.8409.13421276410840294130.gitbox@gitbox.apache.org> Date: Fri, 28 Jun 2019 10:16:58 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit massakam opened a new issue #4635: Bookie down causes deadlock in broker URL: https://github.com/apache/pulsar/issues/4635 One of multiple bookie servers in our cluster went down due to a hardware failure. At the same time, the broker server went down. Messages that the broker could not connect to ZK were output to its log. I think this is due to a deadlock. ``` 19:38:55.846 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 25 seconds 19:38:57.846 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 23 seconds 19:38:59.847 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 21 seconds 19:39:01.847 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 19 seconds 19:39:03.847 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 16 seconds 19:39:05.847 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 14 seconds 19:39:07.848 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 12 seconds 19:39:09.848 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 10 seconds 19:39:11.848 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds 19:39:13.849 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds 19:39:15.849 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 4 seconds 19:39:17.849 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 2 seconds 19:39:19.849 [pulsar-zk-session-watcher-5-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 0 seconds 19:39:21.850 [pulsar-zk-session-watcher-5-1] ERROR o.a.p.z.ZooKeeperSessionWatcher - timeout expired for reconnecting, invoking shutdown service ``` Below is a thread dump just before the broker shuts down. [broker_threaddump.txt](https://github.com/apache/pulsar/files/3338708/broker_threaddump.txt) This phenomenon is similar to #3566. However the Pulsar version of the broker is 2.3.2, and the previous bug should have already been fixed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services