Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D595F108D1 for ; Thu, 18 Apr 2013 00:45:16 +0000 (UTC) Received: (qmail 33076 invoked by uid 500); 18 Apr 2013 00:45:16 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 33015 invoked by uid 500); 18 Apr 2013 00:45:16 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 33004 invoked by uid 99); 18 Apr 2013 00:45:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 00:45:16 +0000 Date: Thu, 18 Apr 2013 00:45:16 +0000 (UTC) From: "Karthik Kambatla (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5110: ---------------------------------------- Attachment: mr-5110-half-tt-expiry.patch [~vinodkv], here is a new patch that uses half the tt-expriry-interval as the timeout for task launch. Do you think this is a resonable way to go about it, or do you think it is better to add a job-specific parameter? I ll validate the patch we finalize on a cluster. > Long task launch delays can lead to multiple parallel attempts of the task > -------------------------------------------------------------------------- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.1.2 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110-half-tt-expiry.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira