hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kalnichevski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HTTPCLIENT-587) derelativizing of relative URIs with a scheme is incorrect
Date Fri, 16 Jun 2006 18:17:30 GMT
    [ http://issues.apache.org/jira/browse/HTTPCLIENT-587?page=comments#action_12416567 ] 

Oleg Kalnichevski commented on HTTPCLIENT-587:
----------------------------------------------

> For all of 3.0.1 URI's problems, it's better than the Sun class. 

What's wrong with the JDK URI class? I thought it was kind of okay? Actually I was thinking
about suggesting that our 'home brewed' URI class be replaced with the JDK URI class, as HttpClient
4.0 will require Java 1.4 anyways.

> We are still using the 3.x HttpClient in production systems (web crawling) , since no
later releases are officially available.

Have you looked at HttpCore? I believe its API should be better suited for web crawlers. For
one HttpCore does not attempt to validate request-URIs. It will happily execute requests against
any arbitrary request URI

> I will try to work up a patch.

Please do so. If you do not provide a fix for this bug, most likely it will have to wait until
4.0

Oleg

> derelativizing of relative URIs with a scheme is incorrect
> ----------------------------------------------------------
>
>          Key: HTTPCLIENT-587
>          URL: http://issues.apache.org/jira/browse/HTTPCLIENT-587
>      Project: Jakarta HttpClient
>         Type: Bug

>     Versions: 3.0.1
>     Reporter: Gordon Mohr

>
> URI constructor "public URI(URI base, URI relative) throws URIException" assumes that
if given 'relative' URI has a scheme, it should provide an authority and complete path to
the constructed URI. However, a URI can have a scheme but still be relative, requiring the
authority and base path of the 'base' URI. 
> Demonstration code:
> URI base = new URI("http://www.example.com/some/page");
> URI rel = new URI("http:boo");
> URI derel = new URI(base,rel);
> derel.toString();
> (java.lang.String) http:boo
> In fact, derel should be "http://www.example.com/some/boo". 
> RFC2396 is a little confused about this; section 3.1 states ""Relative URI references
are distinguished from absolute URI in that they do not begin with a scheme name." But, in
section 5, there are several sentences talking about relative URIs that begin with schemes
(and how this prevents using relative URIs that have leading path segments that look like
scheme identifiers). 
> RFC3896, which supercedes RFC2396, removes the implication a relative URI cannot begin
with a scheme, leaving the other text explcitly discussing relative URIs with schemes. 
> Both Firefox (1.5) and IE (6.0) treat "http:boo" the same as "boo" for purposes of derelativization
against an HTTP base URI, which would give the final URI "http://www.example.com/some/boo"
in the example above. 
> Even relative URIs like "http:../../boo" are explicitly legal. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Mime
View raw message