hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
Date Mon, 04 Jun 2018 23:35:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501034#comment-16501034
] 

Xiao Chen commented on HADOOP-15217:
------------------------------------

Thanks very much for the thoughts here Zsolt, and the discussion offline.

To summarize, I think the core problem is, {{URL#openStream}} can throw IOE for all kinds
of reasons, so we're not explicitly wrapping the fix in this patch. The added test is valid
to make sure there is no regression, and in case of a regression, I'd argue changing the exception
type from IOE to AssertError doesn't change the probability of a developer ignoring the test. For
the same reason, we don't wrap filesystem.create calls to make sure it doesn't regress. :)

So given the extra 6 lines of code doesn't make debugging easier, we should aim for readability
and let the underlying exception fail the test in case of a regression.

> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-15217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Joseph Fourny
>            Assignee: Zsolt Venczel
>            Priority: Major
>         Attachments: HADOOP-15217.01.patch, HADOOP-15217.02.patch, HADOOP-15217.03.patch,
HADOOP-15217.04.patch, HADOOP-15217.05.patch, TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when Spark is
initialized), it breaks URLs with spaces (even though they are properly URI-encoded). I traced
the problem down to _FSUrlConnection.connect()_ method. It naively gets the path from the
URL, which contains encoded spaces, and pases it to _org.apache.hadoop.fs.Path(String)_ constructor.
This is not correct, because the docs clearly say that the string must NOT be encoded. Doing
so causes double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics an issue I ran into when trying to use Commons Configuration 1.9
AFTER initializing Spark. Commons Configuration uses URL class to load configuration files,
but Spark installs _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using
an AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message