cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Carvalho <scarva...@criticalsoftware.com>
Subject Re: AW: [C2]: Proposal for caching
Date Wed, 24 Jan 2001 16:24:58 GMT
On Wed, 24 Jan 2001 08:24:49 +0100
"Carsten Ziegeler" <cziegeler@sundn.de> wrote:

> > > 2. The most static part of the pipeline should be cached.
> > ?!? The end result should be cached. 
>
> If only the end result is cached, many pipelines would never be cached,
> as the pipeline is only "static" up to a specific point.
> Imagine a pipeline with a generator, four transformer and one serializer
> at the end. The generator reads a static file, the first three transformers
> do "static" transformations and only the forth is dynamic.
> Then it would be possible to cach the result of the generator and the
> three transformations. 
> This approach is necessary for content aggreation, as there is no serializer
> at the end of each pipeline.

I agree with you, *if* we can't make the assumption that transformers depend only on their
XML input. I am really treading on unkown territory here, as I don't know C2 basic design
principles. If transformers get data from sources other than those declared in the sitemap,
you are 100% right. If, on the other hand, transformers get data from sources declared on
the sitemap, then you can check if sources are dirty and deduce when the transformer's output
is dirty. And you can cache only the final result on the pipeline. Aggregators aggregate pipelines,
so each aggregated pipeline has a cache.

> > 
> > > 4. If a component is cacheable it should implement an interface 
> > Cacheable.
> > > 5. Caching is done until the first component in the pipeline is 
> > not Cacheable.
> > Rephrasing: Caching is done if every component in the pipeline is 
> > Cacheable. Remember, no caching on each stage.
> OK, I didn't mean caching on each stage, see point 2 for an example. 
> The caching is not done at each stage, only at one specific stage in 
> the pipeline. Until this stage the content is "more" static than
> in the rest of the pipeline.
> But this stage is not necessarily the last stage after the serializer.
>

It should be the last stage, but that is a design decision that is probably done by now. We
just need to check if generators depend only on their declared inputs. It is crucial.


> > 
> > > 6. Identifying if the cache contains the response (or the most 
> > static part
> > >    of it) can be done by a unique key. The Cacheable interface 
> > implements
> > >    a getKey() method. The keys of all Cacheable components are 
> > chained together
> > >    and build the unique key for the cache.
> > -1. If Stuart is right, and transformers' output depends only on 
> > their input, then you just need to check the inputs, and you just 
> > need to cache the final output.
> Ok, in some way the transformers output depends only on their input.
> But what is the input of a transformer? On the one hand the XML stream,
> which comes from the previous component in the pipeline.
> But e.g. the SQL Transformer fetches data from a database. The XML 
> stream for specifying which table and rows to fetch is static and 
> changes never. So the XML stream is static. But the result of the
> fetch changes over time as the data in the database changes.
> So this "input" can be highly dynamic. 

Our discussion really revolves around what is a transformer. Can a transformer get data on
its own, or is all data fed by Cocoon to the transformer? This applies to  question 8 also.



> > 
> > > 7. Cacheable also delivers a Validator object (getValidator()) 
> > which contains all
> > >    information required for this component to test if the 
> > content has changed
> > >    since the last caching.
> > >    The FileGenerator e.g. puts the last modification date into 
> > the Validator.
> > >    The Cache stores all Validators together with the unique key.
> > +1. I'd like to see a setValidator also, so that the Validator 
> > can be overriden.
> > 
> > 
> > > 8. All Cacheable components are now asked hasChanged(Validator). If all
> > >    respond with false, the cache value is valid and is used. 
> > >    If the first responds with true, a new response is 
> > generated, the validators 
> >    are get and the result is put in the cache.
> > 
> > -1. See 6. Only the inputs (generators) must be checked.
> > 
> I disagree, see 6. and also the RT for caching: 
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97300662426038&w=2)
> and some of the mails from the XSP Generator task:
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97661659323627&w=2
> 
> > > 
> > > This approach is very flexible. Each component nows if it is 
> > cacheable and when
> > > it is invalid. Using the Validator object it is possible to 
> > specify something 
> > > like: This component is really 6 hours valid wether it changes or not.
> > 
> > Please, allow the Validator to be overriden. I'm imagining 
> > situations like an HTTPGenerator where the Generator doesn't know 
> > if it is valid and can only safely assume it is not. In these 
> > cases, there are situations where I'd like to override this 
> > default behaviour.
> OK, can you please explane, who exactly will override this behaviour.
> I could imagine a configuration for the HTTPGenerator which 
> configures the way the Validator is build. E.g. a simple approach would
> be, that the HTTPGenerator only reads the input once and assumes that
> that is static. A configuration (or better a parameter in the pipeline)
> could tell the HTTPGenerator that a specific HTTP request is not
> static, but changes all 3 hours, so it could look like:
> <map:match="test">
>     <map:generate src="http://myserver/resource.xml" type="http">
>         <parameter name="content_validation_period" value="3 hourse"/>
>     </map:generate>
> 
> The validator object would contain the last time, the HTTP request was
> done, checks if the 3 hours have expired since then and responds
> according to this information to the hasChanged() method.
> 
> The same configuration is possible/usable for transformers, e.g. the
> SQL Transformer. The SQL Transformer knows, when to refetch data
> from a database by a configuration.

Why wondering which parameters you may want to pass to the validator when you can just replace
the whole validator if you design it to be replaceable. Something like:
  <map:match="test">
    <map:generate src="http://myserver/resource.xml" type="http">
        <parameter name="validator_class" value="com.criticalsoftware.web.FooBarValidator"/>
    </map:generate>
What do you have against this?

-- 
Sergio Carvalho
---------------
scarvalho@criticalsoftware.com

If at first you don't succeed, skydiving is not for you

Mime
View raw message