hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
Date Fri, 03 Oct 2014 22:59:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158643#comment-14158643
] 

Ming Ma commented on YARN-90:
-----------------------------

Thanks, Varun.

The main question about UNHEALTHY state is whether this patch might make it more likely for
a node to become unhealthy given "full disk" has been added as one of the conditions. Given
[~jira.shegalov]'s YARN-1996 and [~sjlee0]'s MAPREDUCE-5817 have suggestions to mitigate the
impact of UNHEALTHY nodes on existing containers and MR task scheduling, this might not be
an issue.

Nit: For "Set<String> postCheckFullDirs = new HashSet<String>(fullDirs);". It
doesn't have to create postCheckFullDirs. It can directly refer to fullDirs later.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch,
apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch,
apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch,
apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it
is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs
restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time
back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message