hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gordon Mohr (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HTTPCLIENT-679) URI Absolutization does not follow browser behavior
Date Fri, 03 Aug 2007 17:12:53 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517586
] 

Gordon Mohr commented on HTTPCLIENT-679:
----------------------------------------

Notably, the browsers are following RFC3986. Taking an example from RFC3986 section 5.4.1
("Normal Examples"):

URI uri = new URI(new URI("http://a/b/c/d;p?q"), "?y");
uri.toString(); // is "http://a/b/c/?y"; by RFC3986 should be "http://a/b/c/d;p?y"




> URI Absolutization does not follow browser behavior
> ---------------------------------------------------
>
>                 Key: HTTPCLIENT-679
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-679
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 RC1
>         Environment: HttpClient 3.1 RC1, 
> JDK 1.6.0
> Ubuntu 7.04
>            Reporter: Jeff Dalton
>
> This was encountered using Heritrix to crawl a prominent website.
> The URI resulting from the HttpClient URI constructor (base, relative) does not follow
browser behavior:
> URI newUrl = new URI(new URI("http://www.theirwebsite.com/browse/results?type=browse&att=1"),
"?sort=0&offset=11&pageSize=10")
> Results in newUrl:
> http://www.theirwebsite.com/browse/?sort=0&offset=11&pageSize=10
> The desired behavior based on Firefox and IE should be:
> http://www.theirwebsite.com/browse/results?sort=0&offset=11&pageSize=10
> These browsers treat the question mark similar to a directory separator and do not require
a file to be specified before the query.
> HttpClient's current behavior does not correspond to current browser behavior and leads
to an inability to crawl certain websites if HttpClient's URI class is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


Mime
View raw message