hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3497) File globbing with a PathFilter is too restrictive
Date Wed, 17 Sep 2008 16:49:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tom White updated HADOOP-3497:

    Attachment: hadoop-3497-v3.patch

The test that is failing is TestFileInputFormatPathFilter#testWithPathFilterWithoutGlob. This
creates files named a, b, aa, bb in a directory, then uses an input format with a filter that
only accepts files whose last component is 1 character long. Only files a and b should match.
The input path is the directory, not a glob path, and to work it relies on the following following
behaviour of FileSystem#globStatus.

If you call FileSystem#globStatus(Path pathPattern, PathFilter filter) with a pathPattern
that has a fixed (non-globbing) final component, then the status for that path will always
be returned, regardless of the filter.

So, for a path /a which exists

fs.globStatus(new Path("/a"), new PathFilter() {
  public boolean accept(Path path) {
    return false;

will return the status for /a, even though the filter rejects every path!

This seems wrong, and should really be changed. It has a potential impact on applications
however, since a filter is now being applied that previously wasn't. Does this seem the right
thing to do?

I've attached a patch which fixes the test.

> File globbing with a PathFilter is too restrictive
> --------------------------------------------------
>                 Key: HADOOP-3497
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3497
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-3497-test.patch, hadoop-3497-v2.patch, hadoop-3497-v3.patch,
> Consider the file hierarchy
> {noformat}
> /a
> /a/b
> {noformat}
> Calling the globStatus method on FileSystem with a path of {noformat}/*/*{noformat} and
a PathFilter that only accepts {{/a/b}} returns no matches. It should return a single match:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message