hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Nowland (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5836) Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
Date Thu, 14 May 2009 18:16:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709509#action_12709509

Ian Nowland commented on HADOOP-5836:

The main fix here is to check for and just not return this empty file in listStatus(). However
along with this, I broadened handling in all S3N methods for the different ways of designating
directories in S3, in this way:
* A note about directories. S3 of course has no "native" support for them.
 * The idiom we choose then is: for any directory created by this class,
 * we use an empty object "#{dirpath}_$folder$" as a marker.
 * Further, to interoperate with other S3 tools, we also accept the following:
 * - an object "#{dirpath}/' denoting a directory marker
 * - if there exists any objects with the prefix "#{dirpath}/", then the
 *   directory is said to exist
 * - if both a file with the name of a directory and a marker for that
 *   directory exists, then the *file masks the directory*, and the directory
 *   is never returned.
In particular this meant fixing delete() and rename() to handle all three possible meanings
of directory without failing.
This patch also includes the following:
-          Add logging any time a file in S3 is accessed for read or write, so when you get
failure accessing/using a file its name will be in the task log
-         Fix when opening a file for reading which doesn't exist, change the behavior to
immediately throw a FileNotFoundException, rather than returning a hard to debug NPE later
when the file is closed.
-          Rewrite rename so that it only deletes the source files after every destination
file has been written, so you never end up with half the files in each location
-         Set up retryer so rename automatically retries on S3 errors.

> Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs
to fail
> ------------------------------------------------------------------------------------------------
>                 Key: HADOOP-5836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5836
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
> Some tools which upload to S3 and use a object terminated with a "/" as a directory marker,
for instance "s3n://mybucket/mydir/". If asked to iterate that "directory" via listStatus(),
then the current code will return an empty file "", which the InputFormatter happily assigns
to a split, and which later causes a task to fail, and probably the job to fail. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message