hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3159) DefaultContainerExecutor removes appcache dir on every localization
Date Thu, 20 Oct 2011 08:15:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131440#comment-13131440

Siddharth Seth commented on MAPREDUCE-3159:

The new Application state transition - INITING to FINISHED on APP_INIT_FAILED will cause problems
with subsequent startContainer() (new containers will be stuck in the NEW state) and finishApplication()
calls. Like the patch says - an additional state, which would have to deal with new container
and finishApp requests. Also, an intermittent failure in app initialization would end up making
the node unusable for the specific app.

Changing DCE to remove the delete in {code}createAppDirs{code} is probably a simpler fix ?
job.jar and job.xml are separate App resources which are localized into their own directory.
One failing should not affect the other (will only affect the container associated with the
failed localization attempt). Don't think the comment in the code about cleaning up the dir
is valid.
> DefaultContainerExecutor removes appcache dir on every localization
> -------------------------------------------------------------------
>                 Key: MAPREDUCE-3159
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3159
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>         Attachments: mr-3159.txt, mr-3159.txt
> The DefaultContainerExecutor currently has code that removes the application dir from
appcache/ in the local directories on every task localization. This causes any concurrent
executing tasks from the same job to fail.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message