Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 021DB180A0 for ; Fri, 19 Jun 2015 13:41:01 +0000 (UTC) Received: (qmail 6222 invoked by uid 500); 19 Jun 2015 13:41:00 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 6182 invoked by uid 500); 19 Jun 2015 13:41:00 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 6170 invoked by uid 99); 19 Jun 2015 13:41:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jun 2015 13:41:00 +0000 Date: Fri, 19 Jun 2015 13:41:00 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3832) Resource Localization fails on a cluster due to existing cache directories MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593424#comment-14593424 ] Jason Lowe commented on YARN-3832: ---------------------------------- It looks like the state store became out-of-sync with the local filesystem state. Can you look back in the NM logs to see when /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 was originally created? Was the state store re-created or the disk declared bad/full in-between the creation of that directory and the error? Seems like something would have had to go wrong with either storing the state or deleting the cache entry on the local disk for this to occur. > Resource Localization fails on a cluster due to existing cache directories > -------------------------------------------------------------------------- > > Key: YARN-3832 > URL: https://issues.apache.org/jira/browse/YARN-3832 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.7.0 > Reporter: Ranga Swamy > Assignee: Brahma Reddy Battula > > *We have found resource localization fails on a cluster with following error.* > > Got this error in hadoop-2.7.0 release which was fixed in 2.6.0 (YARN-2624) > {noformat} > Application application_1434703279149_0057 failed 2 times due to AM Container for appattempt_1434703279149_0057_000002 exited with exitCode: -1000 > For more detailed output, check application tracking page:http://S0559LDPag68:45020/cluster/app/application_1434703279149_0057Then, click on links to logs of each attempt. > Diagnostics: Rename cannot overwrite non empty destination directory /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > java.io.IOException: Rename cannot overwrite non empty destination directory /opt/hdfsdata/HA/nmlocal/usercache/root/filecache/39 > at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:244) > at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Failing this attempt. Failing the application. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)