From dev-return-79284-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Sat Mar 16 07:08:50 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 81047180648 for ; Sat, 16 Mar 2019 08:08:49 +0100 (CET) Received: (qmail 27738 invoked by uid 500); 16 Mar 2019 07:08:47 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 27727 invoked by uid 99); 16 Mar 2019 07:08:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Mar 2019 07:08:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E8266C0EC5 for ; Sat, 16 Mar 2019 07:08:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vfiJDIT70kgK for ; Sat, 16 Mar 2019 07:08:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 198285FB4B for ; Sat, 16 Mar 2019 07:00:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 82E22E2977 for ; Sat, 16 Mar 2019 07:00:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2BEBE24599 for ; Sat, 16 Mar 2019 07:00:03 +0000 (UTC) Date: Sat, 16 Mar 2019 07:00:03 +0000 (UTC) From: "maoling (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-3315) Exceptions in callbacks should be handlable by the application MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794166#comment-16794166 ] maoling commented on ZOOKEEPER-3315: ------------------------------------ [~steven-usabilla] Could you plz help us fix this issue? The contributor guideline is [here] (https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute]) > Exceptions in callbacks should be handlable by the application > -------------------------------------------------------------- > > Key: ZOOKEEPER-3315 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3315 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Steven McDonald > Priority: Major > Attachments: ExceptionTest.java > > > Hi, > In [KAFKA-7898|https://issues.apache.org/jira/browse/KAFKA-7898], a {{NullPointerException}} in a {{MultiCallback}} caused a Kafka cluster to become unhealthy in such a way that manual intervention was needed to recover. The cause of this particular {{NullPointerException}} is fixed in Kafka 2.2.x (with a proposed documentation update in [ZOOKEEPER-3314|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3314]), but I am interested in improving the resiliency of Kafka (and by extension the Zookeeper client library) against such bugs. > Quoting the stack trace from KAFKA-7898: > {code} > [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn) > java.lang.NullPointerException > at kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > {code} > The "caught unexpected throwable" message comes from [within the Zookeeper client library|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L641]. I think that try/catch is pointless, because removing it causes the message to instead be logged [here|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/server/ZooKeeperThread.java#L60], with no discernable change in behaviour otherwise. Explicitly exiting the {{EventThread}} when this happens does not help (I don't think it gets restarted). > This is especially problematic with distributed applications, since they are generally designed to tolerate the loss of a node, so it is preferable to have the application be allowed to terminate itself rather than risk inconsistent state. > I am attaching a simple Zookeeper client which does nothing except throw a {{NullPointerException}} as soon as it receives a callback, to illustrate the problem. Running this results in: > {code} > 232 [main-EventThread] ERROR org.apache.zookeeper.ClientCnxn - Error while calling watcher > java.lang.NullPointerException > at ExceptionTest.process(ExceptionTest.java:31) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:539) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:514) > {code} > This comes from [here|https://github.com/apache/zookeeper/blob/7256d01a26412cd35a46edab6de9ac8c5adf5bb3/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L541], which simply logs the occurrence but provides no way for my application to handle the failure. > I suspect the best approach here might be to allow the application to register a callback to notify it of unhandlable exceptions within the Zookeeper library, since Zookeeper has no way of knowing what approach makes sense for the application. Of course, this is already technically possible in this case by having the application catch all exceptions in every callback, but that doesn't seem very practical. -- This message was sent by Atlassian JIRA (v7.6.3#76005)