hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
Date Fri, 09 Nov 2012 20:15:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494279#comment-13494279
] 

Robert Joseph Evans commented on MAPREDUCE-4751:
------------------------------------------------

I have been doing a quick once over on this, and I have a few comments.

# I think it would be cleaner for KillWaitAttemptKilledTransition to have a constructor that
takes a TaskAttemptCompletionEventStatus, instead of having the subclasses set it directly
themselves.
# Remove the commented out if statement.
# I am not sure if HashSet is the correct data type for success, failed, etc.  They are likely
to be sparse arrays with small amounts of data in them.  Probably not very important, but
if there are thousands of tasks it starts to add up.

Over all it looks OK.  I would like to see more tests though.
                
> AM stuck in KILL_WAIT for days
> ------------------------------
>
>                 Key: MAPREDUCE-4751
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.3, 2.0.2-alpha
>            Reporter: Ravi Prakash
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg
>
>
> We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING.
When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these
maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are
in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message