incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject Re: Suspension
Date Wed, 15 Feb 2012 16:16:40 GMT
There are too many cases where a node legitimately died and we do not want
it constantly coming back and bogging things down. How do you design it to
restart the accidentally deaths but not the deserves it deaths?
On Feb 15, 2012 11:11 AM, "Adam Fuchs" <adam.p.fuchs@ugov.gov> wrote:

> This isn't really just a laptop problem. We also see hiccups in clusters
> (admins accidentally the whole network, etc.) that we would want to
> automatically recover from. I think having self-restarting processes could
> be generally useful.
>
> I think that an option of not using zookeeper timeouts might lead to
> abuse, and could be very bad for stability under rare failure modes. We
> make a lot of assumptions throughout the code about these timeouts, and we
> would have to reconsider a large part of that model.
>
> Adam
>
>
> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
> billie.j.rinaldi@ugov.gov> wrote:
>
>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>> aaron@cordovas.org> wrote:
>> > Such an option would have to be very conspicuous so that users don't
>> > accidentally enable it and then wonder why bad tablet servers aren't
>> > removed automatically from the cluster.
>>
>> We could call it laptop.mode.
>>
>> Billie
>>
>
>

Mime
View raw message