hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed
Date Thu, 01 Nov 2012 21:54:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489062#comment-13489062
] 

Arpit Gupta commented on MAPREDUCE-4749:
----------------------------------------

The approach that is now taken is

There is a queue that holds all active actions and an array list that holds all tainted actions.
When the processing starts we traverse over the tainted actions and if they are clean they
get added to the active cleanup queue and removed from the tainted list. Then if the active
action queue is empty we sleep for 2s and then re process. If not we go ahead and process
the action, if the action being processed happens to be localizing then it gets added to the
tainted actions list. 

We also use another data structure to keep track of what actions are in the queue in order
to avoid duplicates.
                
> Killing multiple attempts of a task taker longer as more attempts are killed
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4749
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Arpit Gupta
>            Assignee: Arpit Gupta
>         Attachments: MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch
>
>
> The following was noticed on a mr job running on hadoop 1.1.0
> 1. Start an mr job with 1 mapper
> 2. Wait for a min
> 3. Kill the first attempt of the mapper and then subsequently kill the other 3 attempts
in order to fail the job
> The time taken to kill the task grew exponentially.
> 1st attempt was killed immediately.
> 2nd attempt took a little over a min
> 3rd attempt took approx. 20 mins
> 4th attempt took around 3 hrs.
> The command used to kill the attempt was "hadoop job -fail-task"
> Note that the command returned immediately as soon as the fail attempt was accepted but
the time the attempt was actually killed was as stated above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message