Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 24637 invoked from network); 21 Apr 2010 13:18:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Apr 2010 13:18:19 -0000 Received: (qmail 72520 invoked by uid 500); 21 Apr 2010 13:18:19 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 72455 invoked by uid 500); 21 Apr 2010 13:18:18 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 72447 invoked by uid 99); 21 Apr 2010 13:18:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 13:18:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 13:18:16 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3LDHskf000316 for ; Wed, 21 Apr 2010 13:17:54 GMT Message-ID: <11090946.114501271855874117.JavaMail.jira@thor> Date: Wed, 21 Apr 2010 09:17:54 -0400 (EDT) From: "Danny Leshem (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException In-Reply-To: <1780197215.37411270627173494.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859341#action_12859341 ] Danny Leshem commented on HADOOP-6688: -------------------------------------- While going over the code to suggest a patch, I found out this was partially fixed as part of HADOOP-6201: S3FileSystem.delete now catches FileNotFoundException and simply returns false, as opposed to propagating the exception as described in the issue. The reason I'm calling this a partial fix is that such recursive directory delete will stop (returning false) the moment the above issue happens. The proper behavior, in my opinion, is to 1) continue deleting files 2) return false only if the top-most directory could not be found. > FileSystem.delete(...) implementations should not throw FileNotFoundException > ----------------------------------------------------------------------------- > > Key: HADOOP-6688 > URL: https://issues.apache.org/jira/browse/HADOOP-6688 > Project: Hadoop Common > Issue Type: Bug > Components: fs, fs/s3 > Affects Versions: 0.20.2 > Environment: Amazon EC2/S3 > Reporter: Danny Leshem > Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > > S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background. > This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem. > The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem. > The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows: > {code} > java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory. > at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334) > at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193) > at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303) > at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312) > at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64) > at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135) > at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > {code} > (similar errors are displayed for ReduceTask.run) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.