hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-152) Exception from launching allocated container
Date Sat, 01 Dec 2012 16:57:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508001#comment-13508001
] 

Brock Noland commented on YARN-152:
-----------------------------------

FWIW, running the job as the yarn user work for me.

To me it looks like there is some issue with user resolution. Note the directory after the
usercache directory below.

CWD set to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/brock/appcache/application_1354339131583_0001
= file:/var/lib/hadoop-yarn/cache/yarn/nm-local-d

java.io.FileNotFoundException: File /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/yarn/appcache/application_1354339131583_0001
does not exist


                
> Exception from launching allocated container
> --------------------------------------------
>
>                 Key: YARN-152
>                 URL: https://issues.apache.org/jira/browse/YARN-152
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications/distributed-shell, nodemanager
>    Affects Versions: 2.0.1-alpha
>            Reporter: Bing Jiang
>
> I use Hadoop-Yarn to deploy my real-time distributed computation system, and I get reply
from mapreduce-user@hadoop.apache.org to follow these guilders below:
>          http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
>          http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
> When I follow the steps to construct my Client, ApplicationMaster. And an issue occurs
to me that  NM fail to launch a Container because of  java.io.FileNotFoundException.
> So the part of NM log  has been attached below:
>  ....
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Adding container_1325062142731_0006_01_000001
to application application_1325062142731_0006
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ApplicationLocalizationEvent.EventType:
INIT_APPLICATION_RESOURCES
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationInitedEvent.EventType:
APPLICATION_INITED
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Processing application_1325062142731_0006 of type APPLICATION_INITED
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1325062142731_0006 transitioned from INITING to RUNNING
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerAppStartedEvent.EventType:
APPLICATION_STARTED
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerInitEvent.EventType:
INIT_CONTAINER
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type INIT_CONTAINER
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from NEW to LOCALIZED
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
LAUNCH_CONTAINER
> 2011-12-29 15:49:16,287 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent.EventType:
CONTAINER_LAUNCHED
> 2011-12-29 15:49:16,287 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type CONTAINER_LAUNCHED
> 2011-12-29 15:49:16,287 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from LOCALIZED to RUNNING
> 2011-12-29 15:49:16,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerStartMonitoringEvent.EventType:
START_MONITORING_CONTAINER
> 2011-12-29 15:49:16,289 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Failed to launch container
> java.io.FileNotFoundException: File /tmp/nm-local-dir/usercache/jiangbing/appcache/application_1325062142731_0006
does not exist
>     at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
>     at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:815)
>     at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>     at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>     at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700)
>     at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697)
>    at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>     at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697)
>     at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:123)
>     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:237)
>     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:67)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType:
CONTAINER_EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type CONTAINER_EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
CLEANUP_CONTAINER
> 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1325062142731_0006_01_000001
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Marking container container_1325062142731_0006_01_000001 as inactive
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Getting pid for container container_1325062142731_0006_01_000001 to kill from pid file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Accessing pid for container container_1325062142731_0006_01_000001 from pid file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
> 2011-12-29 15:49:16,307 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType:
CLEANUP_CONTAINER_RESOURCES
> In order to figure out the fact, I trace back to source code. I find that org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> @Override
>   public int launchContainer(Container container,
>       Path nmPrivateContainerScriptPath, Path nmPrivateTokensPath,
>       String userName, String appId, Path containerWorkDir)
>       throws IOException {
>       ....
>        String[] sLocalDirs = getConf().getStrings(
>         YarnConfiguration.NM_LOCAL_DIRS,
>         YarnConfiguration.DEFAULT_NM_
> LOCAL_DIRS);
>     for (String sLocalDir : sLocalDirs) {
>       Path usersdir = new Path(sLocalDir, ContainerLocalizer.USERCACHE);
>       Path userdir = new Path(usersdir, userName);
>       Path appCacheDir = new Path(userdir, ContainerLocalizer.APPCACHE);
>       Path appDir = new Path(appCacheDir, appIdStr);
>       Path containerDir = new Path(appDir, containerIdStr);
>       lfs.mkdir(containerDir, null, false);
>    }
>   ....
> lfs.mkdir(containerDir, null, false);  refer to the api of mkdir, false means cannot
create parent path here if not exists.
> In my hadoop project, I revise  lfs.mkdir(containerDir, null, false);  to lfs.mkdir(containerDir,
null, true); , then my program goes well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message