hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner
Date Mon, 08 Dec 2014 21:15:12 GMT

     [ https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anubhav Dhoot updated YARN-2931:
--------------------------------
    Description: 
When the data directory is cleaned up and NM is started with existing recovery state, because
of YARN-90, it will not recreate the local dirs.
This causes a PublicLocalizer to fail until getInitializedLocalDirs is called due to some
LocalizeRunner for private localization.

Instead we can have PublicLocalizer not depend on this and also call getInitializedLocalDirs
so it can handle initialization on its own similar to non public localization

Example error 

{noformat}
2014-12-02 22:57:32,629 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Failed to download rsrc { { hdfs:/<blah machine>:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
1417589819618, FILE, null },pending,[(container_1417589109512_0001_02_000003)],119413444132127,DOWNLOADING}
java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
	at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-12-02 22:57:32,629 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1417589109512_0001_02_000003 transitioned from LOCALIZING to LOCALIZATION_FAILED
{noformat}

  was:
When the data directory is cleaned up and NM is started with existing recovery state, because
of YARN-90, it will not recreate the local dirs.
This causes a PublicLocalizer to fail until getInitializedLocalDirs is called due to some
LocalizeRunner for private localization.

Instead we can have PublicLocalizer not depend on this and also call getInitializedLocalDirs
so it can handle initialization on its own similar to non public localization


> PublicLocalizer may fail with FileNotFoundException until directory gets initialized
by LocalizeRunner
> ------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2931
>                 URL: https://issues.apache.org/jira/browse/YARN-2931
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Anubhav Dhoot
>
> When the data directory is cleaned up and NM is started with existing recovery state,
because of YARN-90, it will not recreate the local dirs.
> This causes a PublicLocalizer to fail until getInitializedLocalDirs is called due to
some LocalizeRunner for private localization.
> Instead we can have PublicLocalizer not depend on this and also call getInitializedLocalDirs
so it can handle initialization on its own similar to non public localization
> Example error 
> {noformat}
> 2014-12-02 22:57:32,629 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Failed to download rsrc { { hdfs:/<blah machine>:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
1417589819618, FILE, null },pending,[(container_1417589109512_0001_02_000003)],119413444132127,DOWNLOADING}
> java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
> 	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
> 	at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2014-12-02 22:57:32,629 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1417589109512_0001_02_000003 transitioned from LOCALIZING to LOCALIZATION_FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message