cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricardo Rocha <rica...@apache.org>
Subject Re: XSPGenerator
Date Tue, 12 Dec 2000 16:28:03 GMT
Giacomo Pati wrote:

> --- Ricardo Rocha <ricardo@apache.org> wrote:
>> Here, the term "pipeline" is being used with a meaning slightly
>> different from that of Cocoon's ResourcePipeline class: it means
>> a generator plus zero or more transformers, _sans_ a final
>> serializer.
>> 
>>  From this perspective, Pipeline extends Generator, so it's possible
>> for a pipeline to contain another pipeline as its generator.
> 
> 
> Technically its easy to achieve this for a sitemap engine by
> instructing the Environment to setup a new URL to recursively invoke
> itself but this is only getting content from another URL but not
> aggegation of that at the generator level. You still need some
> constructs like
> 
>   <map:aggregator uri1="foo" uri2="bar" uri3="baz" rootelement="page"/>
> 
> to get at multiple sub-pipelines and information how to structure these
> contents (rootelement).

This sounds very interesting. Please, elaborate!

I see 2 different scenarios for content aggregation:

1) external URI's (which may need to be XMLized, a la
    HTLGenerator)
2) internal sub-pipelines which may be defined at the
    sitemap level (as I understand you suggest above)
    or (is this heressy?) programatically...

>> This, in turn, calls for caching intermediate results (i.e. SAX
>> events generated by (sub)pipelines).
>> 
>> If we have a "SAXCacheable" interface that XMLProducer
>> implementations (i.e. Generators and Transformers) can also
>> implement, then intermediate pipeline results could be cached
>> by interposing a "tee" ContentHandler that would take care of
>> caching generated SAX events (using a mechanism like Stefano's
>> XMLCompiler). This reminds me of good ole' Unix "tee" command
>> used in os pipelines for exactly the same purpose...
> 
> 
> I think the caching stuff is a clear subject to the ResourcePipeline
> class because there all components will be collected and can be asked
> if  they had changed. Also there is the possibility to integrate the
> "tee" components into the SAX stream of the components to collect the
> cachable events or to start spitting cached events into the stream.

Yes, I agree: it's the ResourcePipeline that's responsible
for asking for changes.

Btw, I've come to disagree with the notion that the last step
in a pipeline should be cached in its serialized form only,
never as a SAX event stream (which would be considered
"redundant")

This reasoning appears to asume that any given pipeline can be
serialized in only _one_ way. I'd say the _same_ pipeline may
be applied different serializers depending, for instance, on
the source URI extension (html, pdf, etc), so it does make
sense to cache the last SAX event stream in addition (of course!)
to its serialized form...

For this to work, we must be able to identify [cached] pipeline
_segments_ and "reuse" them in multiple requests!

How could this work?

For one, the ResourcePipeline should test each XMLProducer in
the pipeline ("from left to right") to assert whether it
implements Cacheable and, if so, whether it has changed.

As soon as a non-Cacheable XMLProducer is found, no further
caching should be attempted for the remaining pipeline segment.

For those pipeline elements that do implement Cacheable, a
call to hasChanged() should determine whether to use the cached
SAX events (previously collected via the "tee" ContentHandler)
or to re-{generate/transform} the remaining pipeline segment
from that change point on.

Now, how can we assert whether a given pipeline segment has
been already executed and cached? Here, I'm implying that the
same segment may have been generated in the context of different
requests and, possibly, different <map:match> sitemap entries...

For a Cacheable XMLProducer to assert whether its generated
content has changed or not, it may need to have access to the
original request URI, the actual source ("src" attribute) URI,
the request parameters and other variables. Let's (abusively!)
call these the "producer invocation context."

Let's assume that Cachable defines a getKey() method that maps
this "context" to a unique key value that can be used to
store/locate it in the cache.

Let's further assume that such keys are "additive" in that
they can be safely concatenated to represent the aggregation
of pipeline segments...

Hmmm... I may be daydreaming, but if something like this is
feasible, caching _and_ content aggregation could be achieved
with low memory consumption and and high "reusability"

As usual, just an idea...



Mime
View raw message