zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shelley, Ryan" <Ryan.Shel...@disney.com>
Subject Re: Input on a change
Date Fri, 13 Apr 2012 18:22:13 GMT
Just my 2 centsÅ  is the error code 1 the correct error code to return to
the OS? I'm just curious if anywhere else in ZooKeeper a System.exit(1)
may be called. It may make sense to either re-use that error code, or use
a different one (if 1 is already used elsewhere for a different type of
error, like "Invalid arguments" during start-up, for example).

If the error isn't an OOME, is there any clean-up ZK needs to do to maybe
inform a cluster it's going down abruptly (maybe to gracefully begin a
leader re-election if necessary, for example)?

I'm +1 to fail-fast behavior.

Ryan

On 4/13/12 8:15 AM, "Scott Fines" <scottfines@gmail.com> wrote:

>On some JVMs (the HotSpot for sure, but maybe others too?) there's a JVM
>for performing actions on OutOfMemoryErrors (-XX:OnOutOfMemoryError="<cmd
>args>, -XX:+HeapDumpOnOutOfMemoryError and maybe some others that I can't
>remember off the top of my head). Will these triggers still be fired, or
>will the catch-all prevent them?
>
>I'm still +1 for the change no matter what, but it's probably a good idea
>to mention that in the docs if they don't work.
>
>Scott
>
>On Fri, Apr 13, 2012 at 10:09 AM, Camille Fournier
><camille@apache.org>wrote:
>
>> Hi everyone,
>>
>> I'm trying to evaluate a patch that Jeremy Stribling has submitted, and
>>I'd
>> like some feedback from the user base on it.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-1442
>>
>> The current behavior of ZK when we get an uncaught exception is to log
>>it
>> and try to move on. This is arguably not the right thing to do, and will
>> possibly cause ZK to limp along with a bad VM (say, in an OOM state) for
>> longer than it should.
>> The patch proposes that when we get an instance of java.lang.Error, we
>> should do a system.exit to fast-fail the process. With the possible
>> exception of ThreadDeath (which may or may not be an unrecoverable
>>system
>> state depending on the thread), I think this makes sense, but I would
>>like
>> to hear from others if they have an opinion. I think it's better to kill
>> the process and let your monitoring services detect process death (and
>>thus
>> restart) than possibly linger unresponsive for a while, are there
>>scenarios
>> that we're missing where this error can occur and you wouldn't want the
>> process killed?
>>
>> Thanks for your feedback,
>>
>> Camille
>>


Mime
View raw message