hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14428) s3a: mkdir appears to be broken
Date Fri, 02 Jun 2017 00:55:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033986#comment-16033986

Mingliang Liu commented on HADOOP-14428:

Thanks [~fabbri] and [~stevel@apache.org] for review. This is tricky; I know. :)

The reason that existing test code did not catch this bug is because, as you said, the trailing
slash was removed before it's passed to S3AFileSystem. There are several places that will
remove trailing slashes, mostly {{URI::normalize()}} and {{Path::normalizePath()}}. All Path
constructors will call {{URI::normalize()}} implicitly after which there will be _at most_
one trailing slash. Meanwhile, some Path constructors call {{normalizePath()}} as well. Last,
{{Path::makeQualified()}} method _sometimes_ removes the trailing slash. In our existing code,
the {{Path::makeQualified()}} is called via {{path()}} helper method and it does remove trailing
slashes if any. So overall, the existing test code will not get a chance to carry trailing
slashes before passing to S3AFileSystem.

You can add as many trailing slashes, and it should pass always.
    // With trailing slash
    assertPathExists("mkdir with trailing slash failed", path("testmkdir/b/"));

Comes back to our case,
# note that we say _some Path constructors_ call {{normalizePath()}} and constructor {{Path(URI)}}
does not.
public Path(URI aUri) {
    uri = aUri.normalize();
So the uri is allowed to have one trailing slash. Unfortunately, fs.shell package is using
{{PathData}} class which creates a Path object from command line arguments via {{Path(URI)}}
constructor. In our case the trailing slash is preserved in constructor.
# Actually {{PathData}} will also call {{Path::makeQualified()}} method after constructing
a Path. However,  {{Path::makeQualified()}} _sometimes removes the trailing slash_ by calling
{{Path::normalizePath()}}, and sometimes === not fully qualified.
  /*  ...
   * @return this path if it contains a scheme and authority and is absolute, or
   * a new path that includes a path and authority and is fully qualified
  public Path makeQualified(URI defaultUri, Path workingDir ) {
Here in the command line of this JIRA, we provide a fully qualified path string. As a result,
the trailing slash is preserved again. Eventually the trailing slash gets passed to S3AFileSystem,
where we got fooled to believe {{/a/b/c/}} getParent returns {{/a/b}} while it actually returns

As to the new test case in this patch, the {{new Path(getContract().getTestPath() + "/testMkdirSlashHandling/e///")}}
will fail w/o this fix because:
# We have triple trailing slashes, while {{Path::normalizePath()}} is only able to remove
two of them (refer to its source code)
# After {{Path::normalizePath()}}, {{URI::normalize()}} allows one trailing slash.
So in S3AFileSystem, the Path to mkdir has the trailing slash and we have to remove before
calling {{getParent}}.

If only you find this amusing. Part of my understanding maybe wrong; correct me.

The findbugs warnings are unrelated, and the patch still apply cleanly. I'll hold on commit
by tomorrow. Thanks,

> s3a: mkdir appears to be broken
> -------------------------------
>                 Key: HADOOP-14428
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14428
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2, HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Mingliang Liu
>            Priority: Blocker
>         Attachments: HADOOP-14428.000.patch, HADOOP-14428.001.patch
> Reproduction is:
> hadoop fs -mkdir s3a://my-bucket/dir/
> hadoop fs -ls s3a://my-bucket/dir/
> ls: `s3a://my-bucket/dir/': No such file or directory
> I believe this is a regression from HADOOP-14255.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message