Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 29693 invoked from network); 31 Oct 2008 11:06:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Oct 2008 11:06:42 -0000 Received: (qmail 24002 invoked by uid 500); 31 Oct 2008 11:06:41 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 23993 invoked by uid 500); 31 Oct 2008 11:06:40 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 23977 invoked by uid 99); 31 Oct 2008 11:06:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 04:06:40 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 11:05:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 646F6234C25D for ; Fri, 31 Oct 2008 04:05:44 -0700 (PDT) Message-ID: <1105817037.1225451144410.JavaMail.jira@brutus> Date: Fri, 31 Oct 2008 04:05:44 -0700 (PDT) From: "Steve Loughran (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4305) repeatedly blacklisted tasktrackers should get declared dead In-Reply-To: <1612005475.1222700624226.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644263#action_12644263 ] Steve Loughran commented on HADOOP-4305: ---------------------------------------- I'd be happiest if there was some way of reporting this to some policy component that made the right decision. Because the action you take on a managed-VM cluster is different from hadoop on physical. On physical, you blacklist and maybe trigger a reboot. Or you start running well-known health tasks to see which parts of the system appear healthy. On a VM cluster you just delete that node and create a new one -no need to faff around with the state of the VM if it is a task-only VM; if its also a datanode you have to decommission it first. > repeatedly blacklisted tasktrackers should get declared dead > ------------------------------------------------------------ > > Key: HADOOP-4305 > URL: https://issues.apache.org/jira/browse/HADOOP-4305 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Christian Kunz > Assignee: Amareshwari Sriramadasu > Fix For: 0.20.0 > > > When running a batch of jobs it often happens that the same tasktrackers are blacklisted again and again. This can slow job execution considerably, in particular, when tasks fail because of timeout. > It would make sense to no longer assign any tasks to such tasktrackers and to declare them dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.