zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject Re: Input on a change
Date Mon, 16 Apr 2012 18:38:53 GMT
I would be very happy with a solution that allows a custom uncaught 
exception handler -- that's how we are working around this issue now, by 
installing a new default uncaught exception handler after we're sure 
NIOServerCnxnFactory has installed its own.  I do think by default ZK 
should shut down predictably if it hits an error (with the possible 
exception of ThreadDeath), but turning it off by default until a major 
release seems reasonable if we allow a custom handler.

I don't know that I'm the right person to implement it though, not being 
super strong in Java or in the design of user inputs into ZK, so if 
anyone wants to take a crack at this, feel free.

On 04/16/2012 11:27 AM, Scott Fines wrote:
> Not to start a flame war here..
> Mocks are good for a lot of purposes, but I can think of a few cases where
> testing failure states is a lot easier if you are actively contacting a ZK
> server (like testing a failed client connection's consequences). To that
> end, it seems like a good idea to support this kind of testing. So far, the
> easiest way seems to be embedding a ZK Server.
> On the other hand, it doesn't seem likely that an active, production ZK
> cluster is going to want any kind of unusual behavior. It's almost always
> better to have some kind of behavior that is very well defined and
> predictable happen, even if that behavior is catastrophic. In this case,
> having a production server shut itself off whenever it doesn't know how to
> recover is very predictable, and can be designed around. Leaving that
> cluster in an awkward state that may or may not behave predictably is much
> harder to work with.
> I personally can't see much else that you'd want to do in this scenario
> that falls far enough outside these two patterns that it's worth supporting
> custom handlers. It seems like the support time would be better spent
> developing and planning around these two states.
> Scott
> On Mon, Apr 16, 2012 at 1:09 PM, John Sirois<john.sirois@gmail.com>  wrote:
>> On Mon, Apr 16, 2012 at 10:52 AM, Ishaaq Chandy<ishaaq@gmail.com>  wrote:
>>> Fair enough, if you think it is an anti-pattern then perhaps you can
>>> suggest an alternative that allows one to write tests that are (in
>>> descending order of priority):
>>> 1. Quick and easy to setup and teardown
>>> 2. Repeatably testable and debuggable in an IDE without having to resort
>> to
>>> the external build tool
>>> 3. Testable in parallel, i.e. having more than one build running on a CI
>>> server, so need some way to avoid port clashes
>>> 4. Cross-platform - i.e. run the exact same build sequence on multiple
>>> OSes.
>> We also embed for testing for all the reasons above with good success.  In
>> fact - I like the idea of the server product directly supporting alternate
>> mains - although this may be more burden on zk devs initially.  This would
>> allow an org to write their own main that plugs in an uncaught handler and
>> does whatever is appropriate in their deploy environment.
>>> Ishaaq
>>> On 16 April 2012 22:55, Camille Fournier<camille@apache.org>  wrote:
>>>> I believe that this change is inspired by someone that runs zk
>> embedded.
>>>> Personally I'm not moved by the testing argument, embedding the server
>>> for
>>>> testing is a bit of an anti pattern in my mind.
>>>>  From my phone
>>>> On Apr 15, 2012 11:20 PM, "Ishaaq Chandy"<ishaaq@gmail.com>  wrote:
>>>>> I'd go so far as to say that even the server-code should avoid
>>>> System.exit.
>>>>> Just because it is "meant" to be a standalone system doesn't mean
>> that
>>>> code
>>>>> that makes it impossible to embed it should be encouraged.
>>>>> For e.g, we embed a local version of ZK to be used inside our unit
>>> tests.
>>>>> This makes it much easier for us to control ZK to coincide with test
>>>>> expectations as well as making for much faster build times. It would
>>> be a
>>>>> shame if the embedded ZK started killing the JVM.
>>>>> Ishaaq
>>>>> On 16 April 2012 04:28, Camille Fournier<camille@apache.org>  wrote:
>>>>>> This is a good point.
>>>>>> I think this change should be fine for the server portion of the
>>> code,
>>>>>> since it's designed to be run as a standalone system. But for the
>>>>>> client connection to also call system.exit on such an error is
>>>>>> overreaching for all the reasons listed below.
>>>>>> C
>>>>>> 2012/4/15 Віталій Тимчишин<tivv00@gmail.com>:
>>>>>>> I really would not like for any library to perform a System.exit
>>>> call.
>>>>>> This
>>>>>>> would make huge program exit out of sudden (think about j2ee,
>>> may
>>>>> be
>>>>>>> bitten by security manager).  Note that there are more or less
>> safe
>>>>>> errors,
>>>>>>> like StackOverflowError.
>>>>>>> Also System.exit make testing nightmare. E.g. maven2 silently
>> skips
>>>> any
>>>>>>> tests after the one that calls System.exit. And everything's
>> green.
>>>>>>> As for me good options are:
>>>>>>> 1) Call user-provided uncaught exception handler. Use the one
>> from
>>>> the
>>>>>>> thread that created the connection if one is not specified
>>> explicity.
>>>>>>> 1) Stop everything, notifying user with a global watcher. If
>>>>>> possible,
>>>>>>> clean any static state (e.g. restart threads) and allow to
>> restart
>>>>>>> connection.
>>>>>>> In any case, call user code. Good system already know how to
>> react
>>>> (it
>>>>>> may
>>>>>>> want to send email to admin), allow it to perform well.
>>>>>>> Best regards, Vitalii Tymchyshyn.
>>>>>>> 2012/4/13 Camille Fournier<camille@apache.org>
>>>>>>>> Hi everyone,
>>>>>>>> I'm trying to evaluate a patch that Jeremy Stribling has
>>> submitted,
>>>>> and
>>>>>> I'd
>>>>>>>> like some feedback from the user base on it.
>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1442
>>>>>>>> The current behavior of ZK when we get an uncaught exception
>> to
>>>> log
>>>>>> it
>>>>>>>> and try to move on. This is arguably not the right thing
to do,
>>> and
>>>>> will
>>>>>>>> possibly cause ZK to limp along with a bad VM (say, in an
>>> state)
>>>>> for
>>>>>>>> longer than it should.
>>>>>>>> The patch proposes that when we get an instance of
>>> java.lang.Error,
>>>> we
>>>>>>>> should do a system.exit to fast-fail the process. With the
>>> possible
>>>>>>>> exception of ThreadDeath (which may or may not be an
>> unrecoverable
>>>>>> system
>>>>>>>> state depending on the thread), I think this makes sense,
but I
>>>> would
>>>>>> like
>>>>>>>> to hear from others if they have an opinion. I think it's
>>> to
>>>>> kill
>>>>>>>> the process and let your monitoring services detect process
>> death
>>>> (and
>>>>>> thus
>>>>>>>> restart) than possibly linger unresponsive for a while, are
>> there
>>>>>> scenarios
>>>>>>>> that we're missing where this error can occur and you wouldn't
>>> want
>>>>> the
>>>>>>>> process killed?
>>>>>>>> Thanks for your feedback,
>>>>>>>> Camille
>>>>>>> --
>>>>>>> Best regards,
>>>>>>>   Vitalii Tymchyshyn

View raw message