cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <gkossakow...@apache.org>
Subject Re: Broken caching of servlet: source in some cases
Date Sun, 22 Apr 2007 12:47:46 GMT
Alexander Klimetschek pisze:
> Grzegorz Kossakowski schrieb:
> 
> Great! I can try it on Monday, have to do other things this weekend.
> 
> But I found out two other problems with the order of methods called on 
> the source.

Huh, you are really good at catching bugs :-)

> 1) A (Resource)Reader will always call getLastModified() before 
> getValidity(), which breaks the caching completely, since it starts a 
> servlet connection without the If-Modified-Since set. But it looks like 
> this could be fixed with your new changes! My first idea was to return 
> -1 in getLastModified() until the real value is known after the 
> connection was executed. But I am not sure if this will break other use 
> cases.

Yes, recent changes should fix that.

> (BTW: The method ServletConnection.connect() should be renamed to call() 
> or execute() - connect sounds like doing only the first step, 
> "establishing a connection", but it actually connects, gets the data and 
> "closes" the connection!)

Naming follows URLConnection. See [1] for explanation. I think that it is an implementation
detail what connect() does in a fact and other 
classes should not bother.

> 2) The other problem happens when the validity will be integrated inside 
> an AggregatedValidity together with others, eg. when using 
> <map:aggregate />. In that case it is possible that although the source 
> validity returns valid (and has no response data), the pipeline calls 
> getInputStream(). This is when the other validities are invalid and the 
> decision is made to retrieve fresh new data from all sources. That was 
> the mysterious last bug ;-)
> 
> For this I would propose to change the getInputStream() implementation 
> that it will do a connection without if-modified-since header set 
> regardless if there already was a connection (started from isValid 
> method). This will end in two full sitemap processings, but there seems 
> no other solution to me.

Ahhh, aggregation! I've not thought about it while implementing caching.
Basically, I agree on your proposed change and will do it. However, concern comes to my mind
instantly:
Why sources are not cached at all?!
It seems that only whole pipelines (in "caching" implementation) or pipelines fragments (in
"caching-point" implementation) are cached and 
never data from sources itself.

I guess we'll need a broader discussion about it.

> I evaluated the entire caching algorithms in Cocoon during debugging and 
> here are all the important bits and pieces I came up with from the point 
> of a Source developer. Some is noted on 
> http://cocoon.apache.org/2.1/userdocs/concepts/caching.html but not 
> everything, so I'd like to share it on the list for future work:
> 
> Sources & Caching in Cocoon
> ===========================
> 
> This is typical order of org.apache.excalibur.source.Source and 
> SourceValidity methods called regarding caching:
> 
> getURI()  <- used as cache key for the cached response + the cached 
> validity
> 
> getLastModified()  <- called only by ResourceReader to set the 
> Last-Modified
>                       header if the value is > 0
> 
> SourceValidity.isValid()  <- called on cached (old) validity if found
>                              in cache
> 
> getValidity()  <- called if the old cached validity returned 0 (UNKNOWN) on
>                   isValid() or for putting the new data into the cache
> 
> SourceValidity.isValid(SourceValidity)
>                <- called on cached validity with the new validity as
>                   parameter
> 
> getInputStream()  <- called when any isValid() method returned -a (INVALID)
>                      but also when some other information outside the
>                      current source forces new data to be fetched (eg.
>                      when SourceValidity is put into an AggregatedValidity
>                      together with others - one invalid validity makes all
>                      sources invalid!)
> 
> If the isValid(SourceValidity) method returns UNKNOWN, the new validity 
> will be refetched, so getValidity() is called a second time (!).

Thanks for detailed explanation! I think all these should be documented somewhere because
implementing some advanced functionality is really 
painful as we can see.

[1] http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLConnection.html#connect()

-- 
Grzegorz Kossakowski
http://reflectingonthevicissitudes.wordpress.com/

Mime
View raw message