hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
Date Mon, 08 Apr 2013 19:01:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625678#comment-13625678
] 

Omkar Vinit Joshi commented on YARN-99:
---------------------------------------

Thanks vinod.. I have fixed all the comments.

to test private cache files I have updated the test TestResourceLocalizationService.testLocalizationHeartbeat.
Here I updated the test as follows.
1) made sure that we now have just one local-directory. To enforce its selection for consecutive
paths selection calls.
2) we are now making 2 LocalResourceRequests and verifying that second request's DestinationDirectory
path is inside sub-directory "0".


                
> Jobs fail during resource localization when private distributed-cache hits unix directory
limits
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-99
>                 URL: https://issues.apache.org/jira/browse/YARN-99
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.0.0-alpha
>            Reporter: Devaraj K
>            Assignee: Omkar Vinit Joshi
>         Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of files, the directory
limit reaches before reaching the cache size and fails to create any directories in file cache.
The jobs start failing with the below exception.
> {code:xml}
> java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975
failed
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> 	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified number of
directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message