hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4813) AM timing out during job commit
Date Fri, 30 Nov 2012 01:41:59 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-4813:
----------------------------------

    Attachment: MAPREDUCE-4813.patch

Thanks for the review, Vinod!  I've attached a patch that hopefully addresses most of your
comments.

I agree that abortJob, setupJob, etc. need to be handled as well, as those could take an arbitrary
amount of time as well.  Adding a new top-level service, associated events for that service,
and new state machine wait states will be a bit involved, and I'm keen on getting a fix for
the now common case of long job commits.  If it's OK with you, I'd like to tackle that review
comment in a separate JIRA.
                
> AM timing out during job commit
> -------------------------------
>
>                 Key: MAPREDUCE-4813
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during JobImpl
state transitions, which means the JobImpl write lock is held the entire time the job is being
committed.  Holding the write lock prevents the RM allocator thread from heartbeating to the
RM.  Therefore if committing the job takes too long (e.g.: the job has tons of files to commit
and/or the namenode is bogged down) then the AM appears to be unresponsive to the RM and the
RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message