hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut O. Hellan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-527) Local filecache mkdir fails
Date Wed, 03 Apr 2013 08:11:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620711#comment-13620711
] 

Knut O. Hellan commented on YARN-527:
-------------------------------------

There is really no difference in how the directories are created. What probably happened under
the hood was that the file system reached maximum number of files in the filecache directory.
This maximum size is 32000 since we use EXT3. I don't have the exact numbers for any of the
disks from my checks, but i remember seeing above 30k some places. The reason we were able
to manually create directories might be that there was some automatic cleanup happening. Does
YARN clean the file cache?
                
> Local filecache mkdir fails
> ---------------------------
>
>                 Key: YARN-527
>                 URL: https://issues.apache.org/jira/browse/YARN-527
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.0-alpha
>         Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker
nodes.
>            Reporter: Knut O. Hellan
>            Priority: Minor
>         Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diag
> nostics report from attempt_1364591875320_0017_m_000000_0: java.io.IOException: mkdir
of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
>         at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
>         at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>         at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
>         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least several
nodes in the cluster.
> The situation was resolved by removing and recreating all /disk?/yarn/local/filecache
directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there were corrupt
files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message