Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 12186 invoked from network); 6 Jun 2008 06:01:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jun 2008 06:01:09 -0000 Received: (qmail 91801 invoked by uid 500); 6 Jun 2008 06:01:10 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 91757 invoked by uid 500); 6 Jun 2008 06:01:09 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 91555 invoked by uid 99); 6 Jun 2008 06:01:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2008 23:01:09 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 06:00:28 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DBACC234C139 for ; Thu, 5 Jun 2008 23:00:45 -0700 (PDT) Message-ID: <2116648831.1212732045898.JavaMail.jira@brutus> Date: Thu, 5 Jun 2008 23:00:45 -0700 (PDT) From: "Hemanth Yamijala (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3184) HOD gracefully exclude "bad" nodes during ring formation In-Reply-To: <1772148840.1207351944300.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated HADOOP-3184: ------------------------------------- Release Note: (was: Running through Hudson.) > HOD gracefully exclude "bad" nodes during ring formation > -------------------------------------------------------- > > Key: HADOOP-3184 > URL: https://issues.apache.org/jira/browse/HADOOP-3184 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/hod > Reporter: Marco Nicosia > Assignee: Hemanth Yamijala > Fix For: 0.18.0 > > Attachments: 3184.1.patch, 3184.2.patch > > > HOD clusters sometimes fail to allocate due to a single "bad" node. During ring formation, the entire ring should not be dependent upon every single node being good. Instead, it should either exclude any ring member that does not adequately join the ring in a specified amount of time. > This is a frequent HOD user issue (although not directly caused by HOD). > Examples of bad nodes: Missing java, incorrect version of HOD or Hadoop, local name-cache corrupt, slow network links, drives just beginning to fail, etc. > Many of these conditions are known, and we can monitor for those separately, but this enhancement would shield users from unknown failure conditions that we haven't yet anticipated. This way, a user will get a cluster, instead of hanging indefinitely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.