Mailing-List: contact accumulo-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: accumulo-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <158413449.68371.1329322279137.JavaMail.root@linzimmb04o.imo.intelink.gov>
References: <BBD246D1-6D9A-4382-AB4F-BE32FA9DA913@cordovas.org>
	<292814158.68322.1329321393860.JavaMail.root@linzimmb04o.imo.intelink.gov>
	<158413449.68371.1329322279137.JavaMail.root@linzimmb04o.imo.intelink.gov>
Date: Wed, 15 Feb 2012 11:16:40 -0500
Message-ID: 
 <CADczPYRNFhreDQst31fbEJCpHqPO0Nd2Qfd+dMg7maugZk7Zmg@mail.gmail.com>
Subject: Re: Suspension
From: John Vines <john.w.vines@ugov.gov>
To: accumulo-user@incubator.apache.org
Content-Type: multipart/alternative; boundary=f46d043c807065a34504b903092d

--f46d043c807065a34504b903092d
Content-Type: text/plain; charset=ISO-8859-1

There are too many cases where a node legitimately died and we do not want
it constantly coming back and bogging things down. How do you design it to
restart the accidentally deaths but not the deserves it deaths?
On Feb 15, 2012 11:11 AM, "Adam Fuchs" <adam.p.fuchs@ugov.gov> wrote:

> This isn't really just a laptop problem. We also see hiccups in clusters
> (admins accidentally the whole network, etc.) that we would want to
> automatically recover from. I think having self-restarting processes could
> be generally useful.
>
> I think that an option of not using zookeeper timeouts might lead to
> abuse, and could be very bad for stability under rare failure modes. We
> make a lot of assumptions throughout the code about these timeouts, and we
> would have to reconsider a large part of that model.
>
> Adam
>
>
> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
> billie.j.rinaldi@ugov.gov> wrote:
>
>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>> aaron@cordovas.org> wrote:
>> > Such an option would have to be very conspicuous so that users don't
>> > accidentally enable it and then wonder why bad tablet servers aren't
>> > removed automatically from the cluster.
>>
>> We could call it laptop.mode.
>>
>> Billie
>>
>
>

--f46d043c807065a34504b903092d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>There are too many cases where a node legitimately died and we do not wa=
nt it constantly coming back and bogging things down. How do you design it =
to restart the accidentally deaths but not the deserves it deaths?</p>
<div class=3D"gmail_quote">On Feb 15, 2012 11:11 AM, &quot;Adam Fuchs&quot;=
 &lt;<a href=3D"mailto:adam.p.fuchs@ugov.gov">adam.p.fuchs@ugov.gov</a>&gt;=
 wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
This isn&#39;t really just a laptop problem. We also see hiccups in cluster=
s (admins accidentally the whole network, etc.) that we would want to autom=
atically recover from. I think having self-restarting processes could be ge=
nerally useful.<div>

<br></div><div>I think that an option of not using zookeeper timeouts might=
 lead to abuse, and could be very bad for stability under rare failure mode=
s. We make a lot of assumptions throughout the code about these timeouts, a=
nd we would have to reconsider a large part of that model.</div>

<div><br></div><div>Adam</div><div><br><br><div class=3D"gmail_quote">On We=
d, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <span dir=3D"ltr">&lt;<a href=
=3D"mailto:billie.j.rinaldi@ugov.gov" target=3D"_blank">billie.j.rinaldi@ug=
ov.gov</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div>On Wednesday, February 15, 2012 10:38:4=
1 AM, &quot;Aaron Cordova&quot; &lt;<a href=3D"mailto:aaron@cordovas.org" t=
arget=3D"_blank">aaron@cordovas.org</a>&gt; wrote:<br>


&gt; Such an option would have to be very conspicuous so that users don&#39=
;t<br>
&gt; accidentally enable it and then wonder why bad tablet servers aren&#39=
;t<br>
&gt; removed automatically from the cluster.<br>
<br>
</div>We could call it laptop.mode.<br>
<span><font color=3D"#888888"><br>
Billie<br>
</font></span></blockquote></div><br></div>
</blockquote></div>

--f46d043c807065a34504b903092d--