Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AFB6BCCC5 for ; Fri, 13 Apr 2012 15:16:03 +0000 (UTC) Received: (qmail 70072 invoked by uid 500); 13 Apr 2012 15:16:02 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 70003 invoked by uid 500); 13 Apr 2012 15:16:02 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 69913 invoked by uid 500); 13 Apr 2012 15:16:02 -0000 Delivered-To: apmail-hadoop-zookeeper-user@hadoop.apache.org Received: (qmail 69849 invoked by uid 99); 13 Apr 2012 15:16:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 15:16:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of scottfines@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 15:15:56 +0000 Received: by qcsd1 with SMTP id d1so2444703qcs.35 for ; Fri, 13 Apr 2012 08:15:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=QEbcS+eTKAm7TU+Zl0TKLzf11xrC9OeqVpROqaas0uA=; b=Mo38A2wBqHPSOk71ewYafFV+nxcfDkOptv2fMIvHvFPkICGzlZnVtXCHD7JXjcNjfA 2X4T8qGUNGuUF8yprMOu5tcyd8Jp98s+zRCKxwKJ2DRGqLnYx8XWLI+XSERC+Xm8qThC ZmR8JjQiOvOBb7/kXZucwDbSDcDdPaMj1iR52+NS/neN69epMXAmEID/n1YJ4dflvn6a G+UcE0+bB4B69lJaDUcN5MgpeuNbDZ956lXkOsLbyKLfqNhsIuVFlbPRXGtFgzbb2RKS YSMzkKkfhlOfg3jt5DO4wMzrPqc/8YvoG8Lpu0X2thMf2isod7r74ubpDpglNFqwJ3nm 2Q0g== MIME-Version: 1.0 Received: by 10.224.210.10 with SMTP id gi10mr3146449qab.47.1334330135036; Fri, 13 Apr 2012 08:15:35 -0700 (PDT) Received: by 10.229.184.66 with HTTP; Fri, 13 Apr 2012 08:15:34 -0700 (PDT) In-Reply-To: References: Date: Fri, 13 Apr 2012 10:15:34 -0500 Message-ID: Subject: Re: Input on a change From: Scott Fines To: user@zookeeper.apache.org Cc: zookeeper-user@hadoop.apache.org, zookeeper-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf300faeedb6425d04bd90f17e X-Virus-Checked: Checked by ClamAV on apache.org --20cf300faeedb6425d04bd90f17e Content-Type: text/plain; charset=ISO-8859-1 On some JVMs (the HotSpot for sure, but maybe others too?) there's a JVM for performing actions on OutOfMemoryErrors (-XX:OnOutOfMemoryError=", -XX:+HeapDumpOnOutOfMemoryError and maybe some others that I can't remember off the top of my head). Will these triggers still be fired, or will the catch-all prevent them? I'm still +1 for the change no matter what, but it's probably a good idea to mention that in the docs if they don't work. Scott On Fri, Apr 13, 2012 at 10:09 AM, Camille Fournier wrote: > Hi everyone, > > I'm trying to evaluate a patch that Jeremy Stribling has submitted, and I'd > like some feedback from the user base on it. > https://issues.apache.org/jira/browse/ZOOKEEPER-1442 > > The current behavior of ZK when we get an uncaught exception is to log it > and try to move on. This is arguably not the right thing to do, and will > possibly cause ZK to limp along with a bad VM (say, in an OOM state) for > longer than it should. > The patch proposes that when we get an instance of java.lang.Error, we > should do a system.exit to fast-fail the process. With the possible > exception of ThreadDeath (which may or may not be an unrecoverable system > state depending on the thread), I think this makes sense, but I would like > to hear from others if they have an opinion. I think it's better to kill > the process and let your monitoring services detect process death (and thus > restart) than possibly linger unresponsive for a while, are there scenarios > that we're missing where this error can occur and you wouldn't want the > process killed? > > Thanks for your feedback, > > Camille > --20cf300faeedb6425d04bd90f17e--