hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8409) Address Hadoop path related issues on Windows
Date Wed, 23 May 2012 19:15:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281813#comment-13281813
] 

Ivan Mitic commented on HADOOP-8409:
------------------------------------

Saved the best for last :)

Daryn, thanks for bringing up HADOOP-8139, as it is indeed something we want to address on
Windows.

Let's first make sure that I understood the problem correctly. The Jira is about '\' character
being used as an escape character for metachars, and replace("\\", "/") in Path breaks this.
Your current fix in 0.23 addresses the problem in Unix by not doing this "problematic" replace,
but leaves Windows with the problem. Please correct me if I'm mistaken, as it's a long discussion.

> After a long discussion in HADOOP-8139, it was decided that only RFC standard URIs will
be supported by hadoop. Paths using "\" are not going to be supported.
@Daryn: I would prefer to move the discussion in a direction of how to support "\" by Hadoop
on Windows, and work with the community on the acceptable solution. Aksing users to enter
input paths in form "c:/some/path" does not seem like the right thing to do. Please let me
know if you agree with me here. I would prefer if we address HADOOP-8139 in a separate change,
as this change moves us forward with Windows support, and does not break Unix behavior.

file:/// should not allow authority or port - it is for local file systems.
@Sanjay: I was just trying to illustrate the problem, sorry for the confusion.

There's no reason you have to, you can always use new Path(String).
@Daryn: Actually, this does not work for paths that are symlinks. For example, new Path("/some/path#symlink")
will encode the "#" character internally, so we lose the symlink behavior. This is why I believe
this is a good change. If you take a look at changes I've done to GenericOptionsParser.java
you can see how this simplifies things on the call site.

                
> Address Hadoop path related issues on Windows
> ---------------------------------------------
>
>                 Key: HADOOP-8409
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8409
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, test, util
>    Affects Versions: 1.0.0
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8409-branch-1-win.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There are multiple places in prod and test code where Windows paths are not handled properly.
From a high level this could be summarized with:
> 1. Windows paths are not necessarily valid DFS paths (while Unix paths are)
> 2. Windows paths are not necessarily valid URIs (while Unix paths are)
> #1 causes a number of tests to fail because they implicitly assume that local paths are
valid DFS paths (by extracting the DFS test path from for example "test.build.data" property)
> #2 causes issues when URIs are directly created on path strings passed in by the user

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message