Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82D81D4AC for ; Sat, 1 Dec 2012 16:58:04 +0000 (UTC) Received: (qmail 64756 invoked by uid 500); 1 Dec 2012 16:58:04 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 63418 invoked by uid 500); 1 Dec 2012 16:58:01 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 63346 invoked by uid 99); 1 Dec 2012 16:57:59 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Dec 2012 16:57:59 +0000 Date: Sat, 1 Dec 2012 16:57:59 +0000 (UTC) From: "Brock Noland (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: <1687957003.49230.1354381079569.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (YARN-152) Exception from launching allocated container MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508001#comment-13508001 ] Brock Noland commented on YARN-152: ----------------------------------- FWIW, running the job as the yarn user work for me. To me it looks like there is some issue with user resolution. Note the directory after the usercache directory below. CWD set to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/brock/appcache/application_1354339131583_0001 = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-d java.io.FileNotFoundException: File /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/yarn/appcache/application_1354339131583_0001 does not exist > Exception from launching allocated container > -------------------------------------------- > > Key: YARN-152 > URL: https://issues.apache.org/jira/browse/YARN-152 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, nodemanager > Affects Versions: 2.0.1-alpha > Reporter: Bing Jiang > > I use Hadoop-Yarn to deploy my real-time distributed computation system, and I get reply from mapreduce-user@hadoop.apache.org to follow these guilders below: > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html > When I follow the steps to construct my Client, ApplicationMaster. And an issue occurs to me that NM fail to launch a Container because of java.io.FileNotFoundException. > So the part of NM log has been attached below: > .... > 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.application.Application: Adding container_1325062142731_0006_01_000001 to application application_1325062142731_0006 > 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ApplicationLocalizationEvent.EventType: INIT_APPLICATION_RESOURCES > 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationInitedEvent.EventType: APPLICATION_INITED > 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Processing application_1325062142731_0006 of type APPLICATION_INITED > 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1325062142731_0006 transitioned from INITING to RUNNING > 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerAppStartedEvent.EventType: APPLICATION_STARTED > 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerInitEvent.EventType: INIT_CONTAINER > 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1325062142731_0006_01_000001 of type INIT_CONTAINER > 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1325062142731_0006_01_000001 transitioned from NEW to LOCALIZED > 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: LAUNCH_CONTAINER > 2011-12-29 15:49:16,287 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent.EventType: CONTAINER_LAUNCHED > 2011-12-29 15:49:16,287 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1325062142731_0006_01_000001 of type CONTAINER_LAUNCHED > 2011-12-29 15:49:16,287 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1325062142731_0006_01_000001 transitioned from LOCALIZED to RUNNING > 2011-12-29 15:49:16,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerStartMonitoringEvent.EventType: START_MONITORING_CONTAINER > 2011-12-29 15:49:16,289 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Failed to launch container > java.io.FileNotFoundException: File /tmp/nm-local-dir/usercache/jiangbing/appcache/application_1325062142731_0006 does not exist > at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431) > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:815) > at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697) > at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697) > at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:123) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:237) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:67) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType: CONTAINER_EXITED_WITH_FAILURE > 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1325062142731_0006_01_000001 of type CONTAINER_EXITED_WITH_FAILURE > 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1325062142731_0006_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE > 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER > 2011-12-29 15:49:16,290 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1325062142731_0006_01_000001 > 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Marking container container_1325062142731_0006_01_000001 as inactive > 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Getting pid for container container_1325062142731_0006_01_000001 to kill from pid file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid > 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Accessing pid for container container_1325062142731_0006_01_000001 from pid file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid > 2011-12-29 15:49:16,307 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType: CLEANUP_CONTAINER_RESOURCES > In order to figure out the fact, I trace back to source code. I find that org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > @Override > public int launchContainer(Container container, > Path nmPrivateContainerScriptPath, Path nmPrivateTokensPath, > String userName, String appId, Path containerWorkDir) > throws IOException { > .... > String[] sLocalDirs = getConf().getStrings( > YarnConfiguration.NM_LOCAL_DIRS, > YarnConfiguration.DEFAULT_NM_ > LOCAL_DIRS); > for (String sLocalDir : sLocalDirs) { > Path usersdir = new Path(sLocalDir, ContainerLocalizer.USERCACHE); > Path userdir = new Path(usersdir, userName); > Path appCacheDir = new Path(userdir, ContainerLocalizer.APPCACHE); > Path appDir = new Path(appCacheDir, appIdStr); > Path containerDir = new Path(appDir, containerIdStr); > lfs.mkdir(containerDir, null, false); > } > .... > lfs.mkdir(containerDir, null, false); refer to the api of mkdir, false means cannot create parent path here if not exists. > In my hadoop project, I revise lfs.mkdir(containerDir, null, false); to lfs.mkdir(containerDir, null, true); , then my program goes well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira