cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <>
Subject Re: Broken caching of servlet: source in some cases
Date Sun, 22 Apr 2007 12:47:46 GMT
Alexander Klimetschek pisze:
> Grzegorz Kossakowski schrieb:
> Great! I can try it on Monday, have to do other things this weekend.
> But I found out two other problems with the order of methods called on 
> the source.

Huh, you are really good at catching bugs :-)

> 1) A (Resource)Reader will always call getLastModified() before 
> getValidity(), which breaks the caching completely, since it starts a 
> servlet connection without the If-Modified-Since set. But it looks like 
> this could be fixed with your new changes! My first idea was to return 
> -1 in getLastModified() until the real value is known after the 
> connection was executed. But I am not sure if this will break other use 
> cases.

Yes, recent changes should fix that.

> (BTW: The method ServletConnection.connect() should be renamed to call() 
> or execute() - connect sounds like doing only the first step, 
> "establishing a connection", but it actually connects, gets the data and 
> "closes" the connection!)

Naming follows URLConnection. See [1] for explanation. I think that it is an implementation
detail what connect() does in a fact and other 
classes should not bother.

> 2) The other problem happens when the validity will be integrated inside 
> an AggregatedValidity together with others, eg. when using 
> <map:aggregate />. In that case it is possible that although the source 
> validity returns valid (and has no response data), the pipeline calls 
> getInputStream(). This is when the other validities are invalid and the 
> decision is made to retrieve fresh new data from all sources. That was 
> the mysterious last bug ;-)
> For this I would propose to change the getInputStream() implementation 
> that it will do a connection without if-modified-since header set 
> regardless if there already was a connection (started from isValid 
> method). This will end in two full sitemap processings, but there seems 
> no other solution to me.

Ahhh, aggregation! I've not thought about it while implementing caching.
Basically, I agree on your proposed change and will do it. However, concern comes to my mind
Why sources are not cached at all?!
It seems that only whole pipelines (in "caching" implementation) or pipelines fragments (in
"caching-point" implementation) are cached and 
never data from sources itself.

I guess we'll need a broader discussion about it.

> I evaluated the entire caching algorithms in Cocoon during debugging and 
> here are all the important bits and pieces I came up with from the point 
> of a Source developer. Some is noted on 
> but not 
> everything, so I'd like to share it on the list for future work:
> Sources & Caching in Cocoon
> ===========================
> This is typical order of org.apache.excalibur.source.Source and 
> SourceValidity methods called regarding caching:
> getURI()  <- used as cache key for the cached response + the cached 
> validity
> getLastModified()  <- called only by ResourceReader to set the 
> Last-Modified
>                       header if the value is > 0
> SourceValidity.isValid()  <- called on cached (old) validity if found
>                              in cache
> getValidity()  <- called if the old cached validity returned 0 (UNKNOWN) on
>                   isValid() or for putting the new data into the cache
> SourceValidity.isValid(SourceValidity)
>                <- called on cached validity with the new validity as
>                   parameter
> getInputStream()  <- called when any isValid() method returned -a (INVALID)
>                      but also when some other information outside the
>                      current source forces new data to be fetched (eg.
>                      when SourceValidity is put into an AggregatedValidity
>                      together with others - one invalid validity makes all
>                      sources invalid!)
> If the isValid(SourceValidity) method returns UNKNOWN, the new validity 
> will be refetched, so getValidity() is called a second time (!).

Thanks for detailed explanation! I think all these should be documented somewhere because
implementing some advanced functionality is really 
painful as we can see.


Grzegorz Kossakowski

View raw message