cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Wyngaard " <PWynga...@compendiabio.com>
Subject Proper way to cache pipelines with SQL Transformer?
Date Thu, 10 Jan 2008 00:32:29 GMT
Hello,

 

One of my first goals with cocoon was to great a simple, read-only,
RESTful interface to some of the objects in our database.  So, for
example, I'd like to have a set of simple URLs like:

 

http://localhost:8888/myblock/data/study.xml?study_name={study_name}
<http://localhost:8888/myblock/data/study.xml?study_name=%7bstudy_name%7
d> 

 

Throwing this together took no time, after I got over the spring
datasource issues and cocoon-databases-bridge issues in Cocoon 2.2.

 

I quickly found that are database operations are pretty expensive, and
since our database contains a lot of read-only data, it would be nice to
cache.  I understand why SQL Transformer isn't cacheable, and thought I
might just subclass it and implement Cacheable for my application.  But
before I dove in that deep, I thought it would be simpler to use
ExpiresCachingProcessingPipeline.

 

So:

 

<map:pipe name="caching"
src="org.apache.cocoon.components.pipeline.impl.ExpiresCachingProcessing
Pipeline">

      <map:parameter name="cache-expires" value="-1" /> <!-- never
expire -->

</map:pipe>

 

and

 

<map:pipeline type="caching">

      <map:parameter name="purge-cache" value="{request-param:purge}" />

 

...

</map:pipeline>

 

This worked great, except that it became clear very quickly that the
request parameters were not being included in the cache-key, because the
two requests:

 

http://localhost:8888/myblock/data/study.xml?study_name=ONE

http://localhost:8888/myblock/data/study.xml?study_name=TWO

 

Returned the same results!  So I set out to create my own cache-key.
Here were my requirements:

 

* need to include request parameters as part of the cache key

* need to sort the request parameters so that "...?a=1&b=2" and
"...?b=2&a=1" are cached only once, not twice

* need to exclude any special request parameters, like "purge=true" from
the cache key

 

I couldn't find a way to do this with the existing InputModules, so I
created my own that has one attribute, "request-params", that helps with
this.  Once added as an <input-module> named "cache-keygen", I was then
able to do the following:

 

      <map:parameter name="purge-cache" value="{request-param:purge}" />

      <map:parameter name="cache-key"
value="{request:sitemapURI}?{cache-keygen:request-params}" />

 

This worked great.  But I can't help but think that I missed the proper
cocoon way of doing this.  Did I?

 

After working with this solution for a bit, I discovered two things:

 

1.  That IdentifierCacheKey puts a prefix of "IK:{true,false}:" on my
cache-key, and that the true/false indicates whether it was an external
pipeline call or not.

 

2.  The external pipelines that invoked these internal pipelines with a
"cocoon:" URL passed their request parameters along.

 

So issue #1 led me to decide to make all these SQL Transformer pipelines
internal-only, so that the cache prefix is always "IK:false:", because I
don't need to keep two copies of everything in the cache, one "true" and
one "false".  So far, this hasn't been a problem for me.  I had
initially made it external, as sometimes I am going to want to get these
"raw" database objects, and other times other pipelines are going to
aggregate and transform them into other objects.

 

Issue #2 was also a surprise.  Let's say I create a pipeline that
aggregates a study and some other stuff.  For example:

 

http://localhost:8888/myblock/data/studyData.xml?study_name={study_name}
&type={type}
<http://localhost:8888/myblock/data/studyData.xml?study_name=%7bstudy_na
me%7d&type=%7btype%7d> 

 

And this pipeline called an internal pipeline to get the study object
using:

 

cocoon:/data/study?study_name={study_name}

 

Since all the request parameters of the "parent" are passed on to the
"child" study pipeline, this results in a cache key:

 

IK:false:data/study?study_name=...&type=...

 

which is a shame, because I only need one cached version of the study.
So after some more digging in the source, I discovered the concept of
"raw" cocoon requests.  As far as I can tell, the "raw" request just
prevents the chaining of parameters and attributes up the call stack.

 

So, this just meant I have to call all my internal pipelines with
"cocoon:raw:" instead of "cocoon:".  This worked, and I haven't
discovered any other side-effects of "raw" yet.

 

And that's the story so far.

 

So I'm happy to have made it this far, and things are working well.
These roadblocks, however, left me feeling that I must not be doing
things the "cocoon way".

 

For example, the idea that the query string parameters are not made part
of the cache-key by default, was surprising.  Is the "cocoon way" to not
put request parameters in the query string?  Using the query string is a
fairly standard practice in designing RESTful interfaces.

 

Thanks!  I have more stories to come.

 

Peter

 


Mime
View raw message