hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut O. Hellan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-527) Local filecache mkdir fails
Date Tue, 02 Apr 2013 13:39:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619797#comment-13619797
] 

Knut O. Hellan commented on YARN-527:
-------------------------------------

Digging through the code, it looks to me like the native Java File.mkdirs is used to actually
create the directory and it will not give information about why it failed. If that is the
case then I guess this issue is actually a feature request that yarn should be better at cleaning
up old file caches so that this situation will not happen.
                
> Local filecache mkdir fails
> ---------------------------
>
>                 Key: YARN-527
>                 URL: https://issues.apache.org/jira/browse/YARN-527
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.0-alpha
>         Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker
nodes.
>            Reporter: Knut O. Hellan
>            Priority: Minor
>         Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diag
> nostics report from attempt_1364591875320_0017_m_000000_0: java.io.IOException: mkdir
of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
>         at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
>         at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>         at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
>         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least several
nodes in the cluster.
> The situation was resolved by removing and recreating all /disk?/yarn/local/filecache
directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there were corrupt
files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message