hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lavkesh Lahngir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
Date Thu, 07 May 2015 12:52:01 GMT
Lavkesh Lahngir created YARN-3591:
-------------------------------------

             Summary: Resource Localisation on a bad disk causes subsequent containers failure

                 Key: YARN-3591
                 URL: https://issues.apache.org/jira/browse/YARN-3591
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Lavkesh Lahngir


It happens when a resource is localised on the disk, after localising that disk has gone bad.
NM keeps paths for localised resources in memory.  At the time of resource request isResourcePresent(rsrc)
will be called which calls file.exists() on the localised path.

In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns
true. But at the time of reading, file will not open.

Note: file.exists() actually calls stat64 natively which returns true because it was able
to find inode information from the OS.

A proposal is to call file.list() on the parent path of the resource, which will call open()
natively. If the disk is good it should return an array of paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message