hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-3474) NM disk failure detection only covers local dirs
Date Sun, 27 Nov 2011 22:32:40 GMT
NM disk failure detection only covers local dirs 
-------------------------------------------------

                 Key: MAPREDUCE-3474
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3474
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: nodemanager, tasktracker
    Affects Versions: 0.23.0, 0.20.205.0
            Reporter: Eli Collins


This is the MR counterpart to HDFS-1848. Like HDFS volume failure detection, NM disk failure
detection checks a subset of the disks, and a subset of the directories. Eg the TT and the
NM do not check the root disk for errors unless a local dir resides on them. Even if a local
dir resides on the root disk the disk checking code only checks the local dirs so a failure
only seen when accessing a part of the disk no hosting the local dirs will not be noticed.
The disk that hosts the logs, pid, tmp dirs etc is critical, so if needs to be checked as
well, and the NM should shutdown if a critical disk is not available (to prevent MR issues
similar to HDFS-1848 and HDFS-2095). Typically people currently work around this limitation
by (aside from ignoring it) by using raid-1 for the root disk or a health script that checks
the root disk health.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message