hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
Date Wed, 08 Oct 2014 21:39:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164214#comment-14164214

zhihai xu commented on YARN-90:

I looked at the patch:
some nits I found:
1. can change
 if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
 if (postCheckOtherDirs.contains(dir)) {
because postCheckFullDirs and postCheckOtherDirs are mutually exclusive set.

2. same to item 1
 if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
if (postCheckFullDirs.contains(dir)) {

3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a very small chance)
like the following? 
long i = 0L;
while (target.exists()) {
      randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++;
      target = new File(dir, randomDirName);   

4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop earlier?
     if (!preCheckDirs.contains(dir)) {
        disksFailed = true;

5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?


> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch,
apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch,
apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch,
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it
is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs
restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time

This message was sent by Atlassian JIRA

View raw message