Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 56149 invoked from network); 2 Dec 2008 13:59:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Dec 2008 13:59:09 -0000 Received: (qmail 64689 invoked by uid 500); 2 Dec 2008 13:59:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 64635 invoked by uid 500); 2 Dec 2008 13:59:17 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 64543 invoked by uid 99); 2 Dec 2008 13:59:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Dec 2008 05:59:16 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Dec 2008 13:57:56 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 69DCC234C2A9 for ; Tue, 2 Dec 2008 05:58:44 -0800 (PST) Message-ID: <1935717865.1228226324432.JavaMail.jira@brutus> Date: Tue, 2 Dec 2008 05:58:44 -0800 (PST) From: "Devaraj Das (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4305) repeatedly blacklisted tasktrackers should get declared dead In-Reply-To: <1612005475.1222700624226.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652368#action_12652368 ] Devaraj Das commented on HADOOP-4305: ------------------------------------- Some comments: 1. Format if condition brackets properly in incrementFaults method 2. You should be able to use the same datastructure for both potentiallyFaulty and blacklisted trackers. 3. Add a comment for mapred.cluster.average.blacklist.threshold that it is there solely for tuning purposes and once this feature has been tested in real clusters and an appropriate value for the threshold has been found, this config might be taken out. 4. Check whether you can remove initialContact flag and use only the restarted flag in the heartbeat method. This is a more serious change but might be worthwhile in simplifying the state machine. > repeatedly blacklisted tasktrackers should get declared dead > ------------------------------------------------------------ > > Key: HADOOP-4305 > URL: https://issues.apache.org/jira/browse/HADOOP-4305 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Christian Kunz > Assignee: Amareshwari Sriramadasu > Fix For: 0.20.0 > > Attachments: patch-4305-0.18.txt, patch-4305-1.txt, patch-4305-2.txt > > > When running a batch of jobs it often happens that the same tasktrackers are blacklisted again and again. This can slow job execution considerably, in particular, when tasks fail because of timeout. > It would make sense to no longer assign any tasks to such tasktrackers and to declare them dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.