hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noah Levitt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-900) Don't enforce URI syntax
Date Tue, 02 Oct 2012 00:25:08 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467356#comment-13467356

Noah Levitt commented on HTTPCLIENT-900:

"technically-illegal-but-usually-functional URIs appear all over... any moderately sized crawl
of the web will encounter hundreds of such URIs. Other dominant net software, such as web
browsers, already tolerate such URIs. So to match commonly expected behavior, and support
real-world net applications"... can't use java.net.URI.

The use of java.net.URI is preventing heritrix, the open source web crawler, from moving to
httpclient. Using httpcore-only is an interesting idea that I will look into. But I already
have code for heritrix to use httpclient so I'd like to prepare a patch for this issue.

> HttpClient 3.x codeline has its own URI implementation, which has been the single largest
source of issues/ bugs. I am, for one, very reluctant to repeat the same mistake.

To address this, org.apache.http.URI could simply wrap a java.net.URI, carrying along its
validation. The key difference would be that it not be final, so users of the library such
as heritrix could override that implementation.
> Don't enforce URI syntax
> ------------------------
>                 Key: HTTPCLIENT-900
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-900
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0 Final
>            Reporter: Marko Asplund
>            Priority: Minor
>             Fix For: Future
> I'm trying to use HttpComponents Client for fetching data from a web site.
> I've ran into problems that seem to be related with the way the request URL query parameters
are handled on the server side.
> The service doesn't encode unsafe characters (e.g. '{' and '}') in response URLs.
> Also when these characters are encoded on the client prior to issuing the request the
service gives incorrect responses.
> The URLs are of the following form:
> http://www.foo.bar/foobar?${APPL}=hetekaue
> On the otherhand HC Client doesn't allow me to send requests with invalid query syntax
> (HttpGet(String) constructor throws an URISyntaxException).
> It would be good if HC Client could be used also in situations like this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message