cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Re: Adding serializer info to SitemapSource
Date Wed, 08 Jun 2005 09:28:20 GMT
Carsten Ziegeler wrote:

>Geert Josten wrote:
>  
>
>>>The caching algorithm is that smart (or complicated?) that it caches a
>>>pipeline based on the components and their configuration, but
>>>independent of the uri. So, if you have the same pipeline twice with a
>>>different serializer and use the internal protocol it's basically the
>>>same pipeline. If you now add the serializer information, you have two
>>>different pipelines with two different cache results.
>>>      
>>>
>>How can a pipeline have different serializers when the only difference is internal
or external protocol?
>>    
>>
>This is a simplified example which doesn't make sense:
>
><match pattern="A">
>  <generate src="a.xml"/>
>  <serialize type="xml"/>
></match>
>
><match pattern="B">
>  <generate src="a.xml"/>
>  <serialize type="html"/>
></match>
>  
>

In this case, the difference isn't because of the use of internal and 
external requests: calling "http://A" and "cocoon://A" build the same 
pipeline!

>A more complex sample is if you use non cacheable components:
>
><match pattern="A">
>  <generate src="a.xml"/>
>  <transform type="NOT CACHEABLE"/>
>  <serialize type="xml"/>
></match>
>
><match pattern="B">
>  <generate src="a.xml"/>
>  <transform type="NOT CACHEABLE"/>
>  <transform src="xmlTohtml.xsl"/>
>  <serialize type="html"/>
></match>
>
>The cached part of the two pipelines is the same.
>
>BTW, in this case adding the serializer to the cocoon source would be wrong.
>  
>

The partially cached content would have a key of type 
"PK-G-file-file:/path/a.xml|", which rightly doesn't include the 
serializer as it's not in the cacheable part.

The reason for not including the serializer in the cache key is when 
cocoon sources are SAXed, in which case the serializer is ignored. The 
restricted key (without the serializer) can be used to cache the SAX 
stream. But when we cache the byte stream, we must include the 
serializer in the key!

Now the problem is that key and validity are computed before the 
pipeline is actually executed, i.e. at a time where we don't know if the 
cocoon source will be used with toSAX() or getInputStream().

I think we should therefore include the serializer in the key and 
validity (if the pipeline is fully cacheable). Sure, it will include 
extra useless information for toSAX() calls, but will include everything 
that is necessary for a correct behaviour for getInputStream() calls.

Note that if a pipeline is only used in internal calls, its serializer 
is very likely to be "xml" which is cacheable and is always valid, which 
therefore doesn't affect the cacheability of the pipeline.

Let's sum up all this, which gets complicated ;-)
- I'm only considering fully cacheable pipelines (for partially 
cacheable ones, the key stops at the last cacheable component)
- "restricted cache key" means key without the serializer
- "full cache key" means key with the serializer

- SitemapSource should be the full key (its hash actually) and full validity
- processing a pipeline to an XML consumer should cache the SAX stream 
using the restricted key and restricted validity
- processing a pipeline to an outputStream should cache the byte stream 
using the full key and full validity

Does it make sense?

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Mime
View raw message