Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 5158 invoked from network); 9 Oct 2008 16:35:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Oct 2008 16:35:40 -0000 Received: (qmail 69346 invoked by uid 500); 9 Oct 2008 16:35:34 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 69307 invoked by uid 500); 9 Oct 2008 16:35:34 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 69226 invoked by uid 99); 9 Oct 2008 16:35:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2008 09:35:34 -0700 X-ASF-Spam-Status: No, hits=-1999.9 required=10.0 tests=ALL_TRUSTED,DNS_FROM_SECURITYSAGE X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2008 16:34:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 60592234C219 for ; Thu, 9 Oct 2008 09:34:44 -0700 (PDT) Message-ID: <78035500.1223570084393.JavaMail.jira@brutus> Date: Thu, 9 Oct 2008 09:34:44 -0700 (PDT) From: "Steve Loughran (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3628) Add a lifecycle interface for Hadoop components: namenodes, job clients, etc. In-Reply-To: <1542900921.1214303805031.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-3628: ----------------------------------- Attachment: hadoop-3628.patch This patch isnt ready to go into the code; I'm putting it up to show progress. -ping() now builds up a status that can include a list of exceptions, so a ping failure can list everything that is wrong. One thing that is unclear is should a ping() failure trigger a transition into the FAILED state, or should the code hope that the system can recover? -the Service class implements Closeable, and the void close() throws IOException; method of that interface -it means close me and you can throw an exception. terminate() relays to close() and logs the failure. If you want to close quietly, use terminate(); if you are OK with an exception call close(). Anything aggregating other services has to call terminate() or othewise discard the exceptions, so it does a best case shut down. -I've moved MockService into the core, though tools is probably better. Why out of test? Because it turns out to be invaluable for hadoop management tools to test their ability to handle failures in the service lifecycle. I dont think ping() and the termination logic are stable yet; they seem over complex. I'd rather Closeable() and close() everywhere, including other things that are aggregate services (MiniDFSCluster, MiniMRCluster). > Add a lifecycle interface for Hadoop components: namenodes, job clients, etc. > ----------------------------------------------------------------------------- > > Key: HADOOP-3628 > URL: https://issues.apache.org/jira/browse/HADOOP-3628 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs, mapred > Affects Versions: 0.19.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: AbstractHadoopComponent.java, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-lifecycle.pdf, hadoop-lifecycle.sxw > > > I'd like to propose we have a standard interface for hadoop components, the things that get started or stopped when you bring up a namenode. currently, some of these classes have a stop() or shutdown() method, with no standard name/interface, but no way of seeing if they are live, checking their health of shutting them down reliably. Indeed, there is a tendency for the spawned threads to not want to die; to require the entire process to be killed to stop the workers. > Having a standard interface would make it easier for > * management tools to manage the different things > * monitoring the state of things > * subclassing > The latter is interesting as right now TaskTracker and JobTracker start up threads in their constructor; that's very dangerous as subclasses may have their methods called before they are full initialised. Adding this interface would be the right time to clean up the startup process so that subclassing is less risky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.