Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 60923 invoked from network); 10 Feb 2010 08:54:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2010 08:54:49 -0000 Received: (qmail 96210 invoked by uid 500); 10 Feb 2010 08:54:49 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 96126 invoked by uid 500); 10 Feb 2010 08:54:49 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 96110 invoked by uid 99); 10 Feb 2010 08:54:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 08:54:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 08:54:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E49F6234C1EF for ; Wed, 10 Feb 2010 00:54:27 -0800 (PST) Message-ID: <930793533.174731265792067935.JavaMail.jira@brutus.apache.org> Date: Wed, 10 Feb 2010 08:54:27 +0000 (UTC) From: "Amareshwari Sriramadasu (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Created: (MAPREDUCE-1475) Race condition while launching task cleanup attempt. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Race condition while launching task cleanup attempt. ---------------------------------------------------- Key: MAPREDUCE-1475 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1475 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu We found a race condition while launching task cleanup attempt on a TaskTracker which would eat up a slot. The scenario is the following: The main attempt is killed by TaskTracker because it was a speculative attempt. Cleanup attempt is launched on the same tracker. Cleanup attempt occupied the slot and is about to start. But, there was a pending RPC: done() from earlier attempt in the RPC queue. Before the cleanup attempt could be launched, TaskTracker processed the rpc from earlier attempt and made the state of the cleanup attempt as KILLED. Launcher did not launch it because it was already KILLED. But, the rpc done() failed with NullPointerException because of false state. In summary, the slot was occupied by the cleanup attempt which could not be launched. And the slot was never released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.