hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
Date Fri, 19 Sep 2014 03:35:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139927#comment-14139927
] 

Hadoop QA commented on YARN-2566:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12669893/YARN-2566.000.patch
  against trunk revision 6434572.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5037//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5037//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5037//console

This message is automatically generated.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk
space for the first localDir.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2566
>                 URL: https://issues.apache.org/jira/browse/YARN-2566
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2566.000.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir to copy the
token file, if the copy is failed for first localDir due to not enough disk space in the first
localDir, the localization will be failed even there are plenty of disk space in other localDirs.
We see the following error for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Unable to create app directory /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
failed
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localizer failed
> java.io.FileNotFoundException: File file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
does not exist
> 	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
> 	at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:344)
> 	at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
> 	at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
> 	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
> 	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
> 	at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
> 	at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1410663092546_0004_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
USER=cloudera	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container
failed with state: LOCALIZATION_FAILED	APPID=application_1410663092546_0004	CONTAINERID=container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,187 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1410663092546_0004_01_000001 transitioned from LOCALIZATION_FAILED to
DONE
> 2014-09-13 23:33:25,187 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1410663092546_0004_01_000001 from application application_1410663092546_0004
> 2014-09-13 23:33:25,187 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
Considering container container_1410663092546_0004_01_000001 for log-aggregation
> 2014-09-13 23:33:25,187 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event CONTAINER_STOP for appId application_1410663092546_0004
> 2014-09-13 23:33:25,187 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path : /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,187 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
delete returned false for path: [/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001]
> 2014-09-13 23:33:25,188 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path : /hadoop/d2/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,188 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
delete returned false for path: [/hadoop/d2/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001]
> 2014-09-13 23:33:25,291 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Stopping resource-monitoring for container_1410663092546_0004_01_000001
> 2014-09-13 23:33:26,159 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Removed completed container container_1410663092546_0004_01_000001
> {code}
> The correct way to do is If the IOException happened during the copy, try the next the
localDir, If all the localDirs are failed to copy, then throw a exception. 
> I will create a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message