hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <o.kalnichev...@dplanet.ch>
Subject Re: CRLF and Connection: close
Date Thu, 20 Mar 2003 22:43:19 GMT

First of all, many thanks for bringing these issues up

> The first thing has to do with the specs for Internet text messages (RFC 822), 
> which HTTP messages are. Apparently this is not really followed, at least in 
> terms of line termination. It is pretty clear that every line is supposed to 
> end with CRLF (\r\n), yet even a brief look at real-world messages will show 
> you that this is routinely ignored. In fact, it seems that a majority of 
> messages are terminated only with LF (\n).

Clearly, this problem needs to be addressed and the code in
HttpConnection#readRawLine & HttpConnection#readLine needs to be made
more robust when dealing with RFC822 non-compliant HTTP servers. I'll
work on a fix tomorrow (it's getting kind of late here)


> I can provide example URLs, if this will help.

Please do so. It will help me testing 

> The second thing has to do with how Keep-alive connections behave. This is a 
> multi-threaded app, using MultiThreadedHttpConnectionManager. It works great, 
> however I don't get much benefit of the shared Connections, because I'm not 
> connecting to the same site more than once, generally. That's OK, the problem 
> I run into is that after running for not very long, I suddenly start getting 
> everything timing out. It's hard to really pinpoint the timing, giving all the 
> activity, and no thread identifiers in the log messages, but I think what is 
> happening is that the system is simply running out of file handles or 
> system-level connections. A quick "netstat -n" shows a whole bunch of open, 
> TIME_WAIT, and other connections. It seems that the Connection Manager is 
> keeping them around for re-use, and following HTTP/1.1. One fix was to send 
> "Connection: close" as a RequestHeader, which really fixed things up, but now 
> I am running into sites that are not responding, and not timing out. The log 
> traces into ReadRawLine() and just sits there. I am still tracking this down, 
> I just wonder if anyone else has seen this also?

The definitive authority on this issue would be Mike Becke. Let's wait
for his comment on it



View raw message