hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Weber <http-as...@dubioso.net>
Subject Re: Proxy server responding with html body (ProtocolException) on intial connection
Date Sat, 08 Apr 2006 14:45:17 GMT
Hello Henrich,

> >> "GET http://myserver/foo.xml HTTP/1.1[\r][\n]"
> >> "User-Agent: Jakarta Commons-HttpClient/3.0[\r][\n]"
> >> "Host: sus-or1vsrte[\r][\n]"
> >> "Proxy-Connection: Keep-Alive[\r][\n]"
> >> "[\r][\n]"
> << "<Html><Body><H1>Unauthorized ...</H1></Body></Html>[\r][\n]"
> << "null[\r][\n]"
> The response is an HTTP protocol violation and the method throws an
> exception.

Hmmm. One might call that a protocol violation, because it doesn't
follow the rules of the HTTP protocol. One might also call that a
pile of garbage, since there is not even an attempt to send back
anything that remotely resembles an HTTP message.

> I believe there is no good way to provide the user with diagnostic
> information (unless I change HttpClient source (readStatusLine()) as
> indicated in
> http://www.mail-archive.com/httpclient-user@jakarta.apache.org/msg01012.html)
>  which descibes a somewhat similar situation).
> Is this correct?

Yes, that is correct.

> Should this really not been enhanced in the main line as
> described in the prior posting?
> It seems to me that one would want to make more information available to
> the user when initial connections cannot be created.

No, not really. The API makes information available if information
is sent back following the rules of the HTTP protocol. If garbage
is sent back from the server or proxy, we could only add garbage to
the exception message. In your case this garbage happens to be
harmless, but why should we take a chance? Who tells us that the
server doesn't send one megabyte of binary garbage that does not
contain a newline character? Should we add that to the exception
message? I don't think so.
Should we add a heuristic to distinguish probably harmless garbage
from possibly malicious garbage that is sent back? I don't think so
either. HttpClient is for HTTP communication. We build in *minor*
tweaks for frequent server misbehavior, such as sending empty lines
between messages. But we don't build in tweaks for all kinds of
server misbehavior, and in particular not for servers that don't
even speak HTTP in their answer.
If the user is a developer, all information is available through
the wire log. If the user is not a developer, what would be the
point of showing her garbage?

> Perhaps ignored status
> lines should be captured in the exception thrown.

There is no ignored status line. There is garbage sent back instead
of a status line. Putting garbage anywhere but in the trashcan is a
risk, since the garbage could be malicious. Adding code to distinguish
between various kinds of garbage that may or may not contain sensible
information about the reason for an error is IMHO a waste of time.

HttpComponents will have a more flexible structure, which will also
make it easier to plug in a custom parser that generates different
error reports. I think our time is better put into working on 4.0.


To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org

View raw message