cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Elam <mi...@pcextremist.com>
Subject Re: forced caching of volatile data
Date Wed, 13 Aug 2003 18:44:27 GMT
Gianugo Rabellino wrote:

> Miles Elam wrote:
>
>> It would be possible to add in some code that if the resource has no 
>> validity objects and the pipeline has an expiry, it creates a dummy 
>> validity object that, if asked, always returns "invalid" but has the 
>> expires timestamp set to maintain the entry.
>
> This was the only missing piece yet, and I wanted to tackle it when 
> I'll be back from vacation. I like your approach, so if you feel like 
> coding it, please go ahead. :-) 


I'm running into a snag and was wondering if anyone had some wisdom to 
grant to me.  The current behavior of caching pipelines is to aggregate 
the keys of all cacheable pipeline components and use them as the cache 
hash lookup.  In the case of a pipeline that has uncacheable components 
but has an expiry, this scheme doesn't work: the key is incomplete or 
nonexistent.  My first thought was to use the request URI but therein 
lies the snag;  Any actions or selectors in use that would fundamentally 
alter the request would be erroneously cached.  In 90% of the cases, I 
don't see this as a problem.  But in cases like when different formats 
are sent to different clients for the same URI

 (eg. XML with XSLT processing instruction for newest browsers and HTML 
to older clients)
   <map:pipeline type="caching">
     <map:parameter name="expires" value="access plus 10 minutes"/>

     <map:match pattern="">
       <map:generate src="index.xml"/>
       <map:transform type="hypothetical_uncacheable"/>
       <map:select type="browser">
         <map:when test="ie">
           <map:transform src="proc_inst.xslt">
             <map:parameter name="stylesheet" value="index2html.xslt"/>
           </map:transform>
           <map:serialize type="xml"/>
         </map:when>
         <map:otherwise>
           <map:transform src="index2html.xslt"/>
           <map:serialize type="html"/>
         </map:otherwise>
       </map:select>
     </map:match>
   </map:pipeline>

if the cache key is simply the URI, older clients may end up with raw 
and, in their case, useless XML.

This can be avoided of course by some minor organization of the pipelines:

   <map:pipeline type="caching">

     <map:match pattern="">
       <map:select type="browser">
         <map:when test="ie">
           <map:generate src="cocoon:/index.xml"/>
           <map:serialize type="xml"/>
         </map:when>
         <map:otherwise>
           <map:generate src="cocoon:/index.html"/>
           <map:serialize type="html"/>
         </map:otherwise>
       </map:select>
     </map:match>
   </map:pipeline>

   <map:pipeline type="caching">
     <map:parameter name="expires" value="access plus 10 minutes"/>

     <map:match pattern="index.xml">
       <map:generate src="index.xml"/>
       <map:transform type="hypothetical_uncacheable"/>
       <map:transform src="proc_inst.xslt">
         <map:parameter name="stylesheet" value="index2html.xslt"/>
       </map:transform>
       <map:serialize type="html"/>
     </map:match>

     <map:match pattern="index.html">
       <map:generate src="index.xml"/>
       <map:transform type="hypothetical_uncacheable"/>
       <map:transform src="index2html.xslt"/>
       <map:serialize type="html"/>
     </map:match>
   </map:pipeline>

This is actually pretty close to how my site organizes things.  Now it 
seems that URIs as cache keys would work, but I can easily see where 
quite a few support emails for help would come from on the user list.

I think it should be done because quite a few things are not cacheable 
and also not up-to-the-second necessary.  A view of an online discussion 
need not be immediate (and is commonly not immediate), but a database 
lookup by every reader of that discussion would be formidable.  Having a 
centralized expiry would allow folks to avoid putting in what amounts to 
futile caching into each component (eg. a database transformer where it 
is not clear whether the cache interval should be time-based or 
info-based).  An administrator can simply say, "This updates every hour 
so that my PII-300 server doesn't fall over if I get Slashdotted."

------

So my question is this:  Should I

a) use full URIs (in which case somehow the URI needs to be made 
available to the pipeline code...which does not seem to be the case 
currently)

b) use some other mechanism which currently eludes me


Either way, any ideas how to implement?  The key is my only problem.  A 
cached response is the easy part.

- Miles Elam



Mime
View raw message