hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pieter Reuse (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
Date Wed, 01 Jul 2015 13:16:05 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pieter Reuse updated HADOOP-12169:
----------------------------------
    Description: 
Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces
to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list,
while doing the same on an empty directory, returns an array of length 1 containing only this
directory itself.

The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758),
keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't.
The bugfix to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
more specifically FileSystem.listStatus, only child elements of a directory should be returned
upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an empty list. This
is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as the test for
HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before
being applied to trunk.

  was:
Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces
to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list,
while doing the same on an empty directory, returns an array of length 1 containing only this
directory itself.

The bugfix is quite simple. In the line of code "{code}...if (keyPath.equals(f)...{code}"
(S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns
false while it shouldn't. The bugfix to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
more specifically FileSystem.listStatus, only child elements of a directory should be returned
upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an empty list. This
is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as the test for
HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before
being applied to trunk.


> ListStatus on empty dir in S3A lists itself instead of returning an empty list
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-12169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12169
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Pieter Reuse
>            Assignee: Pieter Reuse
>         Attachments: HADOOP-12169-001.patch
>
>
> Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces
to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list,
while doing the same on an empty directory, returns an array of length 1 containing only this
directory itself.
> The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code}
(S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns
false while it shouldn't. The bugfix to make f qualified in this line of code.
> More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
more specifically FileSystem.listStatus, only child elements of a directory should be returned
upon a listStatus()-call.
> In detail: 
> {code}
> elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True]
> {code}
> and
> {code}
> def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
> {code}
> Which translates to the result of listStatus on an empty directory being an empty list.
This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem.
> Note: it seemed appropriate to add the test of this patch to the same file as the test
for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before
being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message