hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory
Date Tue, 10 Apr 2012 21:09:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251064#comment-13251064
] 

Siddharth Seth commented on MAPREDUCE-4099:
-------------------------------------------

Bobby, Jason - The cleanup shouldn't be called before service stop - may end up removing history
files before they're moved over. It needs to be after history.stop and before RMCommunicator.stop.

The current patch will work in most cases - because of the 5 second sleep, but can go wrong.

Also, the original patch may be quite useful (separate jira) - it makes writing an AM a little
easier.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory,
assuming properties weren't set to override this behavior. During shutdown the AM tells the
ResourceManager that it has finished before it cleans up the staging directory.  However upon
hearing the AM has finished, the RM turns right around and kills the AM container.  If the
AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack
of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message