zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Input on a change
Date Sun, 15 Apr 2012 18:28:34 GMT
This is a good point.
I think this change should be fine for the server portion of the code,
since it's designed to be run as a standalone system. But for the
client connection to also call system.exit on such an error is
overreaching for all the reasons listed below.


2012/4/15 Віталій Тимчишин <tivv00@gmail.com>:
> I really would not like for any library to perform a System.exit call. This
> would make huge program exit out of sudden (think about j2ee, you may be
> bitten by security manager).  Note that there are more or less safe errors,
> like StackOverflowError.
> Also System.exit make testing nightmare. E.g. maven2 silently skips any
> tests after the one that calls System.exit. And everything's green.
> As for me good options are:
> 1) Call user-provided uncaught exception handler. Use the one from the
> thread that created the connection if one is not specified explicity.
> 1) Stop everything, notifying user with a global watcher. If it's possible,
> clean any static state (e.g. restart threads) and allow to restart
> connection.
> In any case, call user code. Good system already know how to react (it may
> want to send email to admin), allow it to perform well.
> Best regards, Vitalii Tymchyshyn.
> 2012/4/13 Camille Fournier <camille@apache.org>
>> Hi everyone,
>> I'm trying to evaluate a patch that Jeremy Stribling has submitted, and I'd
>> like some feedback from the user base on it.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-1442
>> The current behavior of ZK when we get an uncaught exception is to log it
>> and try to move on. This is arguably not the right thing to do, and will
>> possibly cause ZK to limp along with a bad VM (say, in an OOM state) for
>> longer than it should.
>> The patch proposes that when we get an instance of java.lang.Error, we
>> should do a system.exit to fast-fail the process. With the possible
>> exception of ThreadDeath (which may or may not be an unrecoverable system
>> state depending on the thread), I think this makes sense, but I would like
>> to hear from others if they have an opinion. I think it's better to kill
>> the process and let your monitoring services detect process death (and thus
>> restart) than possibly linger unresponsive for a while, are there scenarios
>> that we're missing where this error can occur and you wouldn't want the
>> process killed?
>> Thanks for your feedback,
>> Camille
> --
> Best regards,
>  Vitalii Tymchyshyn

View raw message