hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
Date Tue, 05 Nov 2013 18:19:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814090#comment-13814090
] 

Vinod Kumar Vavilapalli commented on YARN-90:
---------------------------------------------

bq.  However, I don't quite understand your saying "expose this end-to-end and not just metrics".
We have been using failed-disk metric in our prodution cluster for a year, and it's good enough
for our rapid disk repairment. Enlight me if you have a better way. 
I meant that it should be part of client side RPC report, JMX as well as the metrics. Doing
only one of those is incomplete and so I was suggesting that we do all of that in a separate
JIRA.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it
is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs
restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time
back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message