Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Fri, 5 Apr 2013 18:43:17 +0000 (UTC)
From: "Karthik Kambatla (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <JIRA.12639500.1364432334130.118244.1365187397431@arcas>
In-Reply-To: <JIRA.12639500.1364432334130@arcas>
References: <JIRA.12639500.1364432334130@arcas>
Subject: [jira] [Commented] (MAPREDUCE-5110) Long task launch delays can
 lead to multiple parallel attempts of the task
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623925#comment-13623925 ] 

Karthik Kambatla commented on MAPREDUCE-5110:
---------------------------------------------

Hey Arun, sorry for the delay. I was trying to figure out the root cause behind these occasional launch delays, we encounter them once in a while on a highly loaded cluster. It looks like a node-specific hardware/OS issue. When this happens, the task in question delays the entire job. 

I still believe limiting the task launch time is helpful, particularly in the case of node-specific hardware issues - failing disks, slow networks etc. Also, I discussed this offline with Alejandro and Tom, and they suggested we might not want to introduce a new config for this, but may be use half of the mapred.task.timeout. What do you think of that? 
                
> Long task launch delays can lead to multiple parallel attempts of the task
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch
>
>
> If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira