hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-367) Exception when yarn.nodemanager.local-dirs is not explicitly set
Date Fri, 01 Feb 2013 18:02:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568914#comment-13568914
] 

Eli Reisman commented on YARN-367:
----------------------------------

This is great, I was actually about to do the same thing. there are a number of settings in
the confs that I am sure after doing 3 installs in a row according to the instructions online
of Hadoop 2.0.x and having wierd conf things that result in errors just like this. The AM
Container failure never reports a good exception message, and sometimes even the nodemanager
logs of the job attempt are not very specific due to the nature of the missing config, or
...?

Point being, its almost always a config setting, and the framework almost never provides a
clue other than job start and immediate failure (outfile not found as in Zhijie's example
here, or more cryptic) as to what really made it choke.

                
> Exception when yarn.nodemanager.local-dirs is not explicitly set
> ----------------------------------------------------------------
>
>                 Key: YARN-367
>                 URL: https://issues.apache.org/jira/browse/YARN-367
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>
> If yarn.nodemanager.local-dirs is not explicitly set, and if the default local-dirs are
not the children of hadoop.tmp.dir, the exception will occur when the wordcount example is
run. Bellow is log info.
> ==========
> 2013-01-30 22:16:04,229 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1359612879014_0001_01_000001 by user zshen
> 2013-01-30 22:16:04,247 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Creating a new application reference for app application_1359612879014_0001
> 2013-01-30 22:16:04,250 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
USER=zshen	IP=127.0.0.1	OPERATION=Start Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS
APPID=application_1359612879014_0001	CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,252 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1359612879014_0001 transitioned from NEW to INITING
> 2013-01-30 22:16:04,252 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1359612879014_0001_01_000001 to application application_1359612879014_0001
> 2013-01-30 22:16:04,257 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1359612879014_0001 transitioned from INITING to RUNNING
> 2013-01-30 22:16:04,262 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1359612879014_0001_01_000001 transitioned from NEW to LOCALIZING
> 2013-01-30 22:16:04,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,401 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Writing credentials to the nmPrivate file /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens.
Credentials list: 
> 2013-01-30 22:16:04,423 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Initializing user zshen
> 2013-01-30 22:16:04,569 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Copying from /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens
to /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001.tokens
> 2013-01-30 22:16:04,570 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
CWD set to /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
= file:/tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:04,955 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics:
"", exit_status: -1000, 
> 2013-01-30 22:16:05,117 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,312 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,465 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,608 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,752 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1359612879014_0001_01_000001 transitioned from LOCALIZING to LOCALIZED
> 2013-01-30 22:16:05,866 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1359612879014_0001_01_000001 transitioned from LOCALIZED to RUNNING
> 2013-01-30 22:16:05,866 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is disabled.
> 2013-01-30 22:16:05,910 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Failed to launch container.
> java.io.FileNotFoundException: File /Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
does not exist
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:498)
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:996)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
> 	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:330)
> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:135)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:680)
> 2013-01-30 22:16:05,913 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1359612879014_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-01-30 22:16:05,914 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path : /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
USER=zshen	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container
failed with state: EXITED_WITH_FAILURE	APPID=application_1359612879014_0001	CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,937 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1359612879014_0001_01_000001 transitioned from EXITED_WITH_FAILURE to
DONE
> 2013-01-30 22:16:05,937 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1359612879014_0001_01_000001 from application application_1359612879014_0001
> 2013-01-30 22:16:05,937 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is disabled.
> 2013-01-30 22:16:05,958 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: C_COMPLETE, diagnostics:
"", exit_status: -1, 
> 2013-01-30 22:16:05,959 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Removed completed container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:06,965 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1359612879014_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
> 2013-01-30 22:16:06,965 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path : /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:06,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event APPLICATION_STOP for appId application_1359612879014_0001
> 2013-01-30 22:16:06,970 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1359612879014_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP
to FINISHED
> 2013-01-30 22:16:06,970 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
Scheduling Log Deletion for application: application_1359612879014_0001, with delay of 10800
seconds
> ==========
> Below is the setting in hdfs-site.xml.
> ==========
> <property>
>     <name>hadoop.tmp.dir</name>
>     <value>/Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data</value>
> </property>
> ==========

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message