hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hou Song (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
Date Fri, 01 Nov 2013 02:08:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810961#comment-13810961
] 

Hou Song commented on YARN-90:
------------------------------

Hi guys, I have been using my patch for this issue for a very long time. It enables NM to
reuse failed diskes after they come back, and tt also adds a new metric of the number of failed
directories so people have clearer view from outside.
For unit tests, I add a test to TestLocalDirsHandlerService, and mimic disk failure by "chmod
000 failed_dir", and mimic disk repairment by "chmod 000 failed_dir". 
If anyone interested, I can post this patch here.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>         Attachments: YARN-90.1.patch, YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it
is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs
restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time
back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message