commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remy Maucherat" <>
Subject Re: [httpclient] Lots of patches and discussion
Date Thu, 14 Feb 2002 18:17:05 GMT
> Please excuse this really long post, but there's a lot to cover.
> First, let me introduce myself.  My name is Marc Saegeser and I've been a
> committer on the Jakarta-Tomcat project for over a year.  I was the
> manager for the Tomcat 3.2.2-3.2.4 releases.  I'm not currently a
> on any Jakarta-Commons projects.

Welcome :)

> Now a question.  What is the status of the HttpClient 2.0 release?  The
> is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't
> been modified since October, 2001.  I ask because, depending on how
> an actual release is, some of the changes that I'm proposing should
> be made on a separate branch.

Maybe Rodney could comment on that.

> Here's my story.  I have need of something like HttpClient in my product
> I found that I had to extend it somewhat.  The extensions are very generic
> and I believe useful to others so I'd like to add to the HttpClient
> I also found several bugs that I fixed along the way.  I've documented
> changes below.
> I need to be able to use HttpClient (or a derivative) to navigate around
> web pretty much like a regular user-agent.  I want to be able to access
> site and any web application that I can reach with a reasonably modern
> browser.  HttpClient does a good job of implementing the client side of
> 2616.  Unfortunately, there are lots of sites and some very big name
> applications that do not implement the server side correctly.  Some sites
> (Yahoo! in particular) actually require a broken client implementation
> to log in.  Here are two examples of things I've found so far.
> RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET
> method but acknowledges that most clients are broken in this regard (this
> the failure that Yahoo! requires).  I have found sites that send relative
> URLs in the Location: header of a redirect (this violates RFC2616/14.30).
> Supporting these sites will require 'breaking' HttpClient.  I propose
> some kind of flag to put HttpClient into a 'compatability mode' that
> impelements this and any other required broken behaviour.

That sounds reasonable.

> A second need is to provide a mechanism for getting user acknowledgment
> certain actions.  For exampe when redirecting from secure to non-secure
> sites.
> I am going to start working on these changes next but I want to discuss
> with the HttpClient community so see if they feel they belong in the
> HttpClient project or if the project should be forked.
> Anyway, below is a description of the modified and new files.  The patches
> and new files are attached.
> Modified files...
>   -  Added support for old Netscape cookies.  The biggest difference is
> the test for valid domains is different for Netscape cookies and RFC 2109
> cookies.
>   -  Added space after the semicolons separating the values.  This is
> required by sites that only implement the old Netscape cookie
>   -  Added additional date format for expiration times.
>   -  The write*() and print*() methods now throw HttpRecoverableException.
>   -  Added a new exception class, HttpRecoverableException.  There are
> error conditions that we can try to recover from internally.  The biggest
> one I found was when a server unexepectedly closed the socket.  In this
> we should just try to re-open the connection and try the request again.
>   -  Fixed a problem with the handling of 100 status codes.  If we get a
> after we've already sent the request body, RFC 2616 states that the
> should be ignored.  The currently implementation incorrectly broke out of
> the loop looking for the response.
>   -  Always recreate the cookie header.  A redirect response may have
> included additional cookies that we need to send with the redirected
> and the path may have changed thus requiring a different cookie set.
>   -  Fixed readRequestBody implementation.  A new version of this function
> also takes an output stream.  This makes it easier for subclasses to use
> this implementation directly instead of having to re-implement it in order
> to support things like saving the response to a file.
>   -  Better support for responses that don't contain a Content-Length or
> Transfer-Encoding header.  By the specification, if these headers are both
> absent, the response has no body content.  In the real world what this
> is that the server probably didn't know the length when the response was
> committed.  It just sends the response and closes the connection when the
> body is complete.  This assumption falls apart when we get a response that
> *can not* contain a body.  In this case, the simple implemenation keeps
> reading looking for a response body and actually ends up reading the next
> response headers as the body.  I've added a list of responses that,
> according to the specification, can not ever have a body and fixed
> readResponseBody() to not read a body for these responses.
>   -  Added getPath() method.  This method returns the path portion of a
> given URL.  The only difference from is that this
> method returns "/" if the URLs path is empty.
>   -  Switched to new HttpMethodBase.readResponseBody().
> New files...
>   -  Replacement for HttpClient.  This class serves two purposes.  First
> handles off-site redirects.  Second, it is intended to be used within a
> multithreaded application that, like a browser, may have more than one
> request outstanding to a given server and have requests going to more than
> one server.
>   -  Since HttpMultiClient, unlike HttpClient, simultaneously handles
> requesets for multiple servers it can't use HttpMethod classes directly
> because they only include path information, not server information.  A new
> interface, HttpUtlMethod, is used that extends HttpMethod.
>   -  A simple wrapper around HttpState to synchronized access to data.
> is required to support the multi-threaded nature of HttpMultiClient.
>   -  This is actually the heart of HttpMultiClient.  It keeps track of
> available HttpConnections for host:port combinations.  The number of
> connections to a given host:port is limited (per RFC 2616) and if the
> is reached calls to getConnection() will block until a connection becomes
> available.
>   -  Extends HttpException.  This exception is thrown when a potentially
> recoverable error has occurred (e.g. a socket connection was closed
> unexpectedly).  Higher level code can attempt to try the operation again.
>   -  An interface that extends HttpMethod.  HttpUrlMethod classes are
> initialized with a fully qualified URL instead of just the path component.
>   -  These classes exetend their respective method classes and impelement
> HttpUrlMethod.

>From my point of view, these changes are fine as they don't seem to modify
the API too much (and if they did, that wouldn't be a big problem to me, as
I'm still using the HTTP client 1.0), and add some useful functionality.
I would be ok directly modifying HttpMethod, but I definitely could
understand if some didn't agree.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message