Return-Path: X-Original-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AF59B9899 for ; Wed, 15 Feb 2012 20:07:52 +0000 (UTC) Received: (qmail 3415 invoked by uid 500); 15 Feb 2012 20:07:52 -0000 Delivered-To: apmail-incubator-accumulo-user-archive@incubator.apache.org Received: (qmail 3389 invoked by uid 500); 15 Feb 2012 20:07:52 -0000 Mailing-List: contact accumulo-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-user@incubator.apache.org Delivered-To: mailing list accumulo-user@incubator.apache.org Received: (qmail 3380 invoked by uid 99); 15 Feb 2012 20:07:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 20:07:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.112.75.239] (HELO iron-u-b-out.osis.gov) (206.112.75.239) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 20:07:46 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmIFADwQPE+sEAbx/2dsb2JhbAA8B7BhgQ+BcgEBAQMBEgJqCwsEBw0uIhMFHRkih12gMQqVOokdgkMEDBwLBj0dgyAdDAI+CRSDHQSITYxpkwY X-IronPort-AV: E=Sophos;i="4.73,424,1325480400"; d="scan'208";a="8574299" Received: from ghost-a.center.osis.gov (HELO mail-ww0-f43.google.com) ([172.16.6.241]) by iron-u-b-in.osis.gov with ESMTP/TLS/RC4-SHA; 15 Feb 2012 15:06:40 -0500 Received: by wgbdr13 with SMTP id dr13so2559650wgb.0 for ; Wed, 15 Feb 2012 12:07:04 -0800 (PST) Received: by 10.180.100.33 with SMTP id ev1mr47325393wib.3.1329336424115; Wed, 15 Feb 2012 12:07:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.181.65 with HTTP; Wed, 15 Feb 2012 12:06:44 -0800 (PST) In-Reply-To: <575316624.68603.1329326452986.JavaMail.root@linzimmb04o.imo.intelink.gov> References: <292814158.68322.1329321393860.JavaMail.root@linzimmb04o.imo.intelink.gov> <158413449.68371.1329322279137.JavaMail.root@linzimmb04o.imo.intelink.gov> <575316624.68603.1329326452986.JavaMail.root@linzimmb04o.imo.intelink.gov> From: John Vines Date: Wed, 15 Feb 2012 15:06:44 -0500 Message-ID: Subject: Re: Suspension To: accumulo-user@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d0444026658ad8004b906417d --f46d0444026658ad8004b906417d Content-Type: text/plain; charset=ISO-8859-1 Perhaps we want a suspend option which provides the ZK timeouts one large skew before it expects normal behavior again? John On Wed, Feb 15, 2012 at 12:20 PM, Aaron Cordova wrote: > Yeah, we don't want to let designing a restart service distract us from > the suspension discussion. > > Issuing a 'suspend' command sounds like a third option. > > So far we have: > > 1) run Accumulo in a mode that ignores long timeouts (perhaps enabled just > before suspension) > 2) let Accumulo die (no modification to Accumulo) and rely on a > to-be-created restart service > 3) issue a command to suspend processes before suspending the VM / OS > > Perhaps the 'suspend' command just enables ignorance of timeouts, but if > you're gonna issue a command, you might as well just issue the 'shutdown' > command. > > What's the start-up time like for large clusters now days? > > Also, what is the effect of taking all tables offline? > > On Feb 15, 2012, at 12:12 PM, David Medinets wrote: > > > It seems like the conversation has wandered away from the main point - > > marking a node as suspended instead of having a monitoring service > > discover that it is non-responsive. Would it possible to issue a > > command-line 'suspend' command. And then a 'resume' command when the > > user is ready to have the node back in the cluster? > > --f46d0444026658ad8004b906417d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Perhaps we want a suspend option which provides the ZK timeouts one large s= kew before it expects normal behavior again?

John

On Wed, Feb 15, 2012 at 12:20 PM, Aaron Cordova <aaron@cordovas.org&g= t; wrote:
Yeah, we don't want to let designing a r= estart service distract us from the suspension discussion.

Issuing a 'suspend' command sounds like a third option.

So far we have:

1) run Accumulo in a mode that ignores long timeouts (perhaps enabled just = before suspension)
2) let Accumulo die (no modification to Accumulo) and rely on a to-be-creat= ed restart service
3) issue a command to suspend processes before suspending the VM / OS

Perhaps the 'suspend' command just enables ignorance of timeouts, b= ut if you're gonna issue a command, you might as well just issue the &#= 39;shutdown' command.

What's the start-up time like for large clusters now days?

Also, what is the effect of taking all tables offline?

On Feb 15, 2012, at 12:12 PM, David Medinets wrote:

> It seems like the conversation has wandered away from the main point -=
> marking a node as suspended instead of having a monitoring service
> discover that it is non-responsive. Would it possible to issue a
> command-line 'suspend' command. And then a 'resume' co= mmand =A0when the
> user is ready to have the node back in the cluster?


--f46d0444026658ad8004b906417d--