Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 58826 invoked from network); 8 Jun 2009 04:40:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jun 2009 04:40:21 -0000 Received: (qmail 21531 invoked by uid 500); 8 Jun 2009 04:40:32 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 21458 invoked by uid 500); 8 Jun 2009 04:40:32 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 21448 invoked by uid 99); 8 Jun 2009 04:40:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 04:40:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 04:40:29 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 10AB4234C045 for ; Sun, 7 Jun 2009 21:40:08 -0700 (PDT) Message-ID: <1189560866.1244436008067.JavaMail.jira@brutus> Date: Sun, 7 Jun 2009 21:40:08 -0700 (PDT) From: "Hong Tang (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status In-Reply-To: <138607381.1236871130868.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717146#action_12717146 ] Hong Tang commented on HADOOP-5478: ----------------------------------- +1. Might it be even easier by just hooking the script with cron? > Provide a node health check script and run it periodically to check the node health status > ------------------------------------------------------------------------------------------ > > Key: HADOOP-5478 > URL: https://issues.apache.org/jira/browse/HADOOP-5478 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.20.0 > Reporter: Aroop Maliakkal > Assignee: Vinod K V > Attachments: hadoop-5478-1.patch > > > Hadoop must have some mechanism to find the health status of a node . It should run the health check script periodically and if there is any errors, it should black list the node. This will be really helpful when we run static mapred clusters. Else we may have to run some scripts/daemons periodically to find the node status and take it offline manually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.