Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 45047 invoked from network); 16 Feb 2010 10:29:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2010 10:29:49 -0000 Received: (qmail 24728 invoked by uid 500); 16 Feb 2010 10:29:49 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 24656 invoked by uid 500); 16 Feb 2010 10:29:49 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 24646 invoked by uid 99); 16 Feb 2010 10:29:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 10:29:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 10:29:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id EE201234C1F0 for ; Tue, 16 Feb 2010 02:29:27 -0800 (PST) Message-ID: <1814373044.296441266316167974.JavaMail.jira@brutus.apache.org> Date: Tue, 16 Feb 2010 10:29:27 +0000 (UTC) From: "Amareshwari Sriramadasu (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Updated: (MAPREDUCE-1398) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed. In-Reply-To: <442856609.7031264180161572.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1398: ----------------------------------------------- Attachment: patch-1398-ydist.txt Patch for Yahoo! distribution. Ran ant test and test-patch. test-patch failed because of MAPREDUCE-1497. All unit tests passed except TestNodeRefresh (due to MAPREDUCE-677). TestNodeRefresh passed when I reran the test. > TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed. > ---------------------------------------------------------------------------------- > > Key: MAPREDUCE-1398 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1398 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Reporter: Hemanth Yamijala > Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1398-1.txt, patch-1398-2.txt, patch-1398-ydist.txt, patch-1398.txt > > > Tasks could be assigned to trackers for slots that are running other tasks in a commit pending state. This is an optimization done to pipeline task assignment and launch. When the task reaches the tracker, it waits until sufficient slots become free for it. This wait is done in the TaskLauncher thread. Now, while waiting, if the task is killed externally (maybe because the job finishes, etc), the TaskLauncher is not notified of this. So, it continues to wait for the killed task to get sufficient slots. If slots do not become free for a long time, this would result in considerable delay in waking up the TaskLauncher thread. If the waiting task happens to be a high RAM task, then it is also wasteful, because by waking up, it can make way for normal tasks that can run on the available number of slots. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.