cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ugo Cei <...@apache.org>
Subject Supporting "conditional GET" in Cocoon
Date Tue, 27 Dec 2005 11:05:54 GMT
Given the time of year, I'm afraid this message will fall on deaf  
ears, but anyway ...

I was recently startled to discover that there's apparently no easy  
way to perform a proper "conditional GET" [1] using Cocoon's sources.  
I wonder: didn't anybody ever try to implement an RSS aggregator or  
other kind of HTTP client that frequently requests seldom changing  
Web resources? And if someone did, didn't he care about blindly  
fetching the whole resource every time, even if not necessary?

Anyway, I just needed this and tried to see what could be done. And  
of course, I wanted to exploit Cocoon's caching mechanism to store  
the contents of already fetched resources. This turned out to be  
harder than expected, due in part to the way checking the validity of  
sources works, but mostly to my own ignorance of the subject.

First of all, I though that the best place to implement this behavior  
was in a Source object. This seems to me to be the correct choice,  
but it has one potentially negative side-effect. More on this later.

I also decided to exploit the SourceValidity interface. After all,  
it's there for this very purpose. Unfortunately, this is where things  
turned out to be not so simple. To understand why, here is a  
description of how my first attempt worked:

1. A generator requests an HTTP resource.
2. A suitable factory provides a new instance of class HttpSource.
3. The new Source's getInputStream method is called: this uses  
Jakarta Commons HttpClient to fetch the requested URL.
4. The new Source's getValidity method is called: this returns a new  
HttpSourceValidity object containing the values from the Last- 
modified and Etag response headers, if present.
5. The same HTTP resource is requested again.
6. The SourceValidity object associated with the previous request is  
recovered and it's isValid method is called.
7. The HttpSourceValidity implementation of the method uses the  
stored Last-modified and Etag values to perform a proper conditional  
GET. Here, two things might happen:

8a. A "304 Not Modified" status is returned. isValid returns VALID  
and Cocoon uses the cached version. Everybody is happy.

8b. A "200 OK" status is returned, as the original resource has  
perhaps been modified. isValid returns INVALID and Cocoon calls the  
Source's getInputStream method anew. Everybody is NOT happy, because  
the original resource has been fetched twice: once by the  
SourceValidity and once by the Source itself.

You see, the problem is that there's no easy way for the  
SourceValidity to tell Cocoon that it should reuse what has just been  
retrieved.

I could have used a HEAD request in the SourceValidity. This would  
have saved some bandwidth but still the server would have had to  
compute the response twice, if not particularly smart. And still,  
doing two HTTP requests when one suffices does not seem quite optimal.

So I thought really hard about the problem and came up with a  
(hopefully) brilliant solution: Use a ThreadLocal. The  
HttpSourceValidity will store in a ThreadLocal the response data  
(actually an instance of HttpClient's GetMethod class) and the  
HttpSource will use it later, in the same request and hence in the  
same thread, to provide an InputStream for reading.

I've provided a patch for this (see http://issues.apache.org/jira/ 
browse/COCOON-1726) against the 2.1 branch. Please have a look at it  
(particularly the FIXME comments) as I would like some expert advice  
on the implementation before finalizing it.

One problem that might arise is due to the fact that with the  
cocoon.xconf settings included in the patch, all "http" URIs will be  
served by this Source, overriding the default handling by Excalibur's  
URLSource. This could change the behavior of existing applications,  
but it would strike me as strange having to use some other pseudo- 
protocol (cached-http ?).

	Ugo

[1] http://fishbowl.pastiche.org/2002/10/21/ 
http_conditional_get_for_rss_hackers

-- 
Ugo Cei
Tech Blog: http://agylen.com/
Open Source Zone: http://oszone.org/
Wine & Food Blog: http://www.divinocibo.it/



Mime
View raw message