hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
Date Wed, 15 Oct 2014 06:09:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172050#comment-14172050
] 

Ming Ma commented on YARN-90:
-----------------------------

Thanks Varun.

You and Jason discussed about disk clean up scenario. It will be useful to clarify if the
following scenario will be resolved by this jira or a separate jira is necessary.

1. A disk became ready only. So DiskChecker will mark it as DiskErrorCause.OTHER.
2. Later the disk was repaired and became good. There are still data left on the disk.
3. Given these data are from old containers which have finished, who will clean up these data?

Nit: disksTurnedBad's parameter name preCheckDirs, it is better to name it preFailedDirs.

In the getDisksHealthReport, people can't tell if the disk fails due to full disk or failed
disk, might be useful to distinguish the two cases.

verifyDirUsingMkdir, is it necessary given DiskChecker.checkDir will check it?
 

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch,
apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch,
apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch,
apache-yarn-90.8.patch, apache-yarn-90.9.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it
is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs
restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time
back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message