hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Patch Submission for Header parsing error
Date Sun, 11 Jan 2004 12:42:51 GMT
Andrew,
The response produced by the web server is clearly wrong and is in
violation of the HTTP spec. I personally doubt that HttpClient can be
expected to provide a work around for every single problem caused by
every single crappy web server out there. What if the server sent two
extraneous bytes or more?

I do understand that developers of HTTP spiders do need to deal with
broken or non-compliant web servers. Just recently we have had a lengthy
and at times very animated discussion with another developer of HTTP
crawler software regarding somewhat similar problem in HttpParser class.
Still I strongly disagree that it is feasible for the stock version of
HttpClient to be able to work around all the 'exotic' protocol
violations. In my opinion the problem can better be addressed by a
generic plug-in mechanism which would allow custom implementations of
HttpParser to enhance HttpClient capabilities to recover from
application specific HTTP protocol violations.

There is already a feature request filed. Have a look and feel free to
contribute your ideas if are in agreement with the suggested approach:

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25468

Oleg

 



On Fri, 2004-01-09 at 22:04, Andrew W. Buchanan wrote:
> I've been encountering a frequent problem with the 2.0-rc2 release in the 
> spider I'm working on where the HttpParser throws an exception when a extra 
> byte is returned from a web server. When this exception is thrown, none of 
> the Headers are returned even though they all contained valid data.
> 
> 
> An example packet from Ethereal is attached.
> 
> As you can see, there is an extraneous byte (0x00) being sent that is causing 
> the problem.
> 
> I've attached a quick and dirty patch to fix this. There was already a test 
> looking for a length < 1 in order to skip processing. Rather than 
> specifically looking for this case, I simply changes the check to look for a 
> length < 2 on the grounds that there could never be a valid header of one 
> character anyway. The patch is against HEAD, but would probably apply to 
> 2.0-rc2 release cleanly.
> 
> Let me know what you think.
> 
> Let me know if this is the wrong place to post this!
> 
> Andrew Buchanan
> 
> ______________________________________________________________________
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org


Mime
View raw message