hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows
Date Sun, 28 Jul 2013 18:17:49 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722018#comment-13722018

Ivan Mitic commented on HADOOP-9774:

Thanks Shanyu for reporting the problem and the patch.

A few comments below.

bq. The reason why I didn't add a unit test is because this behavior cannot be tested on Linux.
I'm reusing HADOOP-8962's unit test case to verify that this change doesn't break on Linux.
Please include a unittest that catches the issue on Windows, otherwise, some other change
could break this functionality again.

bq. I think the behavior of RawLocalFileSystem.listStatus() will diverge with this patch,
which is not a very good practice.
I share the same concern.  Now that we realized that HADOOP-8962 does not work in all scenarios
I'd like to consider other options first. I think we can solve this problem in Path properly.
There is a comment in HADOOP-8962 stating that changing Path is scary (what I have to agree
with), but at this point, I'd rather go with a clean fix. Others, please comment if you agree.

The root cause of the problem for HADOOP-8962 is actually in the Path(String, String) constructor,
coming from the internal Path URI parsing logic. I don't think we want to change the parsing
logic as that would be really risky. What we can do is change the constructor:
  public Path(String parent, String child) {
    this(new Path(parent), new Path(child));
not to create a Path() object directly out of a child path string, but to first create an
URI (with assumption that child string is just a path portion of the URI), and then construct
the child Path object using this URI. One gotcha to keep in mind is the URI encoding/decoding
of escape chars. Otherwise, I think this will work.
> RawLocalFileSystem.listStatus() return absolution paths when input path is relative on
> ----------------------------------------------------------------------------------------------
>                 Key: HADOOP-9774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9774
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: shanyu zhao
>         Attachments: HADOOP-9774.patch
> On Windows, when using RawLocalFileSystem.listStatus() to enumerate a relative path (without
drive spec), e.g., "file:///mydata", the resulting paths become absolute paths, e.g., ["file://E:/mydata/t1.txt",
> Note that if we use it to enumerate an absolute path, e.g., "file://E:/mydata" then the
we get the same results as above.
> This breaks some hive unit tests which uses local file system to simulate HDFS when testing,
therefore the drive spec is removed. Then after listStatus() the path is changed to absolute
path, hive failed to find the path in its map reduce job.
> You'll see the following exception:
> [junit] java.io.IOException: cannot find dir = pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt
in pathToPartitionInfo: [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
> [junit] 	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
> This problem is introduced by this JIRA:
> HADOOP-8962
> Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are relative
paths if the parent paths are relative, e.g., ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]
> This behavior change is a side effect of the fix in HADOOP-8962, not an intended change.
The resulting behavior, even though is legitimate from a function point of view, break consistency
from the caller's point of view. When the caller use a relative path (without drive spec)
to do listStatus() the resulting path should be relative. Therefore, I think this should be

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message