Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C2926FE05 for ; Tue, 16 Apr 2013 03:50:24 +0000 (UTC) Received: (qmail 19004 invoked by uid 500); 16 Apr 2013 03:50:24 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 18468 invoked by uid 500); 16 Apr 2013 03:50:20 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 18364 invoked by uid 99); 16 Apr 2013 03:50:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 03:50:19 +0000 Date: Tue, 16 Apr 2013 03:50:18 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632543#comment-13632543 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5110: ---------------------------------------------------- Trying to understand this, mostly agree with what Arun said. To summarize: - Strictly guaranteeing serial execution of task attempts is not possible in general and is a non-requirement - JT already deals with all kinds of slow-ness with tasks and irrespective of this patch, clients have to deal with the slowness. bq. Where possible (i.e., not transient network partitions), run a single task attempt for a task when speculation is turned off Seems an arbitrary non-requirement, don't see what we gain from this. The JIRA started with the above goal which isn't worth pursing from what I see, but now it seems to have transformed into something more benign. Looked at the patch. It looks like you want quicker failure when tasks are getting launched/localized to meet some kind of SLAs? If that is the case, instead of calling it a 'TT-side implementation', if we call it an aggressive timeout enforced on TTs for tasks, and make it job-configurable, that should do. Right? > Long task launch delays can lead to multiple parallel attempts of the task > -------------------------------------------------------------------------- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.1.2 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira