Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA92CE80F for ; Fri, 22 Feb 2013 01:01:32 +0000 (UTC) Received: (qmail 66878 invoked by uid 500); 22 Feb 2013 01:01:32 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 66846 invoked by uid 500); 22 Feb 2013 01:01:32 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 66837 invoked by uid 99); 22 Feb 2013 01:01:32 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 01:01:32 +0000 Date: Fri, 22 Feb 2013 01:01:32 +0000 (UTC) From: "Hitesh Shah (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-99: ---------------------------- Priority: Blocker (was: Major) > Jobs fail during resource localization when directories in file cache reaches to unix directory limit > ----------------------------------------------------------------------------------------------------- > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0, 2.0.0-alpha > Reporter: Devaraj K > Assignee: Devaraj K > Priority: Blocker > > If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. > {code:xml} > java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira