From notifications-return-2131-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Fri Oct 4 08:56:17 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2F705180651 for ; Fri, 4 Oct 2019 10:56:17 +0200 (CEST) Received: (qmail 31249 invoked by uid 500); 4 Oct 2019 08:56:16 -0000 Mailing-List: contact notifications-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list notifications@zookeeper.apache.org Received: (qmail 31233 invoked by uid 99); 4 Oct 2019 08:56:16 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2019 08:56:16 +0000 From: GitBox To: notifications@zookeeper.apache.org Subject: [GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network Message-ID: <157017937649.30950.10125175467104634977.gitbox@gitbox.apache.org> Date: Fri, 04 Oct 2019 08:56:16 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network URL: https://github.com/apache/zookeeper/pull/1048#discussion_r331403117 ########## File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java ########## @@ -418,66 +426,108 @@ public boolean isQuorumSynced(QuorumVerifier qv) { class LearnerCnxAcceptor extends ZooKeeperCriticalThread { - private volatile boolean stop = false; + private final AtomicBoolean stop = new AtomicBoolean(false); + private final AtomicBoolean fail = new AtomicBoolean(false); - public LearnerCnxAcceptor() { - super("LearnerCnxAcceptor-" + ss.getLocalSocketAddress(), zk.getZooKeeperServerListener()); + LearnerCnxAcceptor() { + super("LearnerCnxAcceptor-" + serverSockets.stream() + .map(ServerSocket::getLocalSocketAddress) + .map(Objects::toString) + .collect(Collectors.joining(",")), + zk.getZooKeeperServerListener()); } @Override public void run() { - try { - while (!stop) { - Socket s = null; - boolean error = false; - try { - s = ss.accept(); - - // start with the initLimit, once the ack is processed - // in LearnerHandler switch to the syncLimit - s.setSoTimeout(self.tickTime * self.initLimit); - s.setTcpNoDelay(nodelay); - - BufferedInputStream is = new BufferedInputStream(s.getInputStream()); - LearnerHandler fh = new LearnerHandler(s, is, Leader.this); - fh.start(); - } catch (SocketException e) { - error = true; - if (stop) { - LOG.info("exception while shutting down acceptor: " + e); - - // When Leader.shutdown() calls ss.close(), - // the call to accept throws an exception. - // We catch and set stop to true. - stop = true; - } else { - throw e; - } - } catch (SaslException e) { - LOG.error("Exception while connecting to quorum learner", e); - error = true; - } catch (Exception e) { - error = true; + if (!stop.get() && !serverSockets.isEmpty()) { + ExecutorService executor = Executors.newFixedThreadPool(serverSockets.size()); + CountDownLatch latch = new CountDownLatch(serverSockets.size()); + + serverSockets.forEach(serverSocket -> + executor.submit(new LearnerCnxAcceptorHandler(serverSocket, latch))); + + try { + latch.await(); + } catch (InterruptedException ie) { + LOG.error("Interrupted while sleeping. Ignoring exception", ie); + } finally { + closeSockets(); Review comment: thanks, nice catch. I will use executor.shutdownNow() to terminate the tasks immediately. I was thinking on adding a timeout first to wait for the tasks to shutdown gracefully in case of an interrupted exception, but I decided against it. At this point we are in a phase when all the tasks are finished (or will be finished anyway because of closing the sockets). This point I think it is better to finish everything quickly so that a new leader election can took place. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services