hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-5749) Fail to localize resources after health status for local dirs changed
Date Wed, 19 Oct 2016 08:43:58 GMT

     [ https://issues.apache.org/jira/browse/YARN-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tao Yang updated YARN-5749:
---------------------------
    Summary: Fail to localize resources after health status for local dirs changed  (was:
Fail to localize resources after health status for local dirs changed occurred by the change
of FileContext#setUMask)

> Fail to localize resources after health status for local dirs changed
> ---------------------------------------------------------------------
>
>                 Key: YARN-5749
>                 URL: https://issues.apache.org/jira/browse/YARN-5749
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Tao Yang
>
> HADOOP-13440 updated FileContext#setUMask method to change umask from local variable
to global variable through updating conf value of "fs.permissions.umask-mode". 
> This method might be called to update value for global umask by LogWriter and ResourceLocalizationService.

> After an application finished, LogWriter will update the umask value to be "137" while
uploading logs for containers. Then the global umask value is updated right now and will affect
other services. In my case , After one of local directories is marked as bad (because the
disk used space is above the threshold defined by "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
ResourceLocalizationService will reinitailize the left local directories and change the permission
from "drwxr-xr-x" to "drw-r-----"(umask value changed from "022" to "137"). From now on, The
NM will always fail to localize resources as the local directories is not executable.
> Detail logs are as follows:
> {code}
> 2016-10-19 15:36:32,650 WARN org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext:
Disk Error Exception:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
>         at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
>         at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
>         at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,650 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localizer failed
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
>         at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_e26_1476858409240_0004_01_000005 transitioned from LOCALIZING to LOCALIZATION_FAILED
> {code}
> In my opinion, it's better if FileContext can compatible with past usage.
> Please feel free to give your suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message