hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4982) AM hung with one pending map task
Date Thu, 07 Feb 2013 22:55:13 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-4982:
----------------------------------

    Affects Version/s:     (was: 2.0.3-alpha)

I see some convincing evidence in the AM log that what I suspected is true.  There was one
less "Assigned from earlierFailedMaps" entry in the log than there were failed map attempts
that received containers.  I see one of them was allocated a normal priority container, although
I'm not sure how from looking at the code.

Originally I thought trunk and 2.0.3-alpha would have the same issue, but I think MAPREDUCE-4893
inadvertently fixes this scenario.  It changed the logic so it tries to assign containers
without locality (i.e.: fast fail maps and reducer priority containers) then falls through
to assigning them to normal maps if it still hasn't found an assignment.  Before that change
it would throw away a fast fail container if no fast fail map was around to take it.  There's
an assert in the code indicating only normal priority map containers are expected, but according
to what I've seen it does appear that fast fail maps can somehow steal a normal priority container
on occasion, leaving a subsequent fast-fail request to be assigned to the normal map attempt
that was stolen from.
                
> AM hung with one pending map task
> ---------------------------------
>
>                 Key: MAPREDUCE-4982
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4982
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 0.23.6
>            Reporter: Jason Lowe
>
> Saw a job that hung with one pending map task that never ran.  The task was in the SCHEDULED
state with a single attempt that was in the UNASSIGNED state.  The AM looked like it was waiting
for a container from the RM, but the RM was never granting it the one container it needed.
> I suspect the AM botched the container request bookkeeping somehow.  More details to
follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message