hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3121) NodeManager should handle disk-failures
Date Tue, 15 Nov 2011 03:50:52 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150221#comment-13150221
] 

Ravi Gummadi commented on MAPREDUCE-3121:
-----------------------------------------

>> Should there be a check for whether there are any good dirs left in ResourceLocalizationService
before starting of localizing the resources?

If there are no good local dirs available, then the previous lines of code
{code}nmPrivateCTokensPath = diskHandler.getLocalPathForWrite(){/code} will through IOExcpetion.
So I think checking again is unnecessary --- unless there is a race condition(disk failure
is identified just before the call to startLocalizer(), for which there is very very little
chance). With the current patch itself, if there are no good local dirs, then startLocalizer()
will anyway fail/throwException.

Anyway, I will add a diskHandler.areDisksHealthy() check in the next version of the patch.
                
> NodeManager should handle disk-failures
> ---------------------------------------
>
>                 Key: MAPREDUCE-3121
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch
>
>
> This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to minimize the impact
of transient/permanent disk failures on containers. With larger number of disks per node,
the ability to continue to run containers on other disks is crucial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message