cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Russell <p...@luminas.co.uk>
Subject Re: [RT] Cocoon in cacheland
Date Tue, 28 Nov 2000 10:04:09 GMT
On Tue, Nov 28, 2000 at 10:40:06AM +0800, Niclas Hedhman wrote:
> "Stevenson, Chris (SSABSA)" wrote:
> > Hear hear!
> >
> > I remember Stefano's original post, and I appreciate this as
> > well, but I am left with a nagging doubt. This is, no-one
> > has convinced me that we need 'Internal Caching'.
> I agree.

Yep. Agree with both of you - I still remain to be convinced, but
at the moment, Cocoon2s generation phase is so slow as to be useless
for dynamic content (in excess of a second on my P-III 600 laptop,
for example, in many cases), and in the case of my systems, there
can be several things that affect what content is delivered (User
Agent, target virtual host, target client language, you name it).

> There was a long discussion about this some 6-12 months ago, perhaps
> longer, where the task is a lot more complicated than what at first
> seems to be the case (even the color of the goldfish may become
> relevant).

Heh. I agree, hence letting the components decide what's relavent
to them.

> I agree with Paul's initial assertment that external caching should be
> looked at very closely, BUT again there are nice traps in that too.
> Does the HTTP 1.1 caching architecture takes User-Agents into account?
> I don't know, but it is this that is the foundation on which Cocoon once
> was created. Different content to different users.

As far as I'm aware, no, external caches don't take the user agent
into effect. This is why I believe there is a need for at very
least a byte stream cache inside Cocoon. With some generation requests
taking more than a second, and some sites taking 10 or more hits per
second, you start rapidly approaching 'big trouble'. Hardware is cheap,
but sadly, fast servlet engines (particularly ones with EJB support)
aren't -- JRun for example costs us approx £5000 per *processor*.

> Next, the internal caching introduces domino effects, where changes
> early in the chain invalidates the cache, and any management will be
> wasted.

Yep. We need to minimise this, obviously. At the end of the day,
what changes most often is the content. Most of the time, the
stylesheets don't change too much. It's a fundamental problem
with pipeline caches unfortunately - where do you draw the line?

I did some work a while back with some sound editing tools
which had an 'audio pipeline' - sound generators at one end,
through filters, mixers, attenuators, you name it. All these
modules had parameters which changed at runtime, and which
could be changed at runtime. Because there were a lot of
places where the pipelines joined, it was *definately* necessary
to have intermediate levels of cache at these locations, but
it normally wasn't elsewhere. Maybe this is what we should
do: Cache the sax results of subpipelines, but not elsewhere.
This would enable us to gain performance benifits when we can
execute one generation stage without executing the other, but
not take up storage space where it's not going to help matters.

> I suggest that the old thread is reviewed, because it exposed a whole
> serie of traps. My suggestion is to only handle;
> 
> a) hasChanged( request ) from each component, if all answer NO, then
> send back the full response.

The trouble is that the definition of 'hasChanged' varies
depending on the entry that's been cached already. I'd much
rather use Validators unless you can give me a reason why
hasChanged is better...?

> b) Each component, especially the dynamic ones, uses an "intelligent"
> approach to hasChanged(). Most data does not need to be signaled
> changed, even if it has. It could delay that change a couple of
> seconds/minutes, to allow more hits. The application decides. So in case
> of the SQL handling, it needs to be given hints on the "death-span" of a
> dataset, i.e. how long can this data be considered valid AFTER a change.

Yeah, I agree. Look at content syndication code - you don't
want to be pulling RDF feeds from all over the net for every
request - once an hour will do. This is covered by validators
too, however. (the component just certifies the cache entry as
valid untils it feel too long has passed.

> After the previous debate, I was pretty much convinced that the
> complexity of advanced schemes would introduce too much bugs, where the
> cached page would be, or not, sent when it was not supposed to. The
> intermediary caching will send Cocoon into Giga Byte land for sure,
> meaning more objects to disk, meaning slower cache response. And
> finally, I believe that the above will introduce enormous speed benefits
> for static content, just like C1, and reasonable speed improvements for
> dynamic content.

What do you think about caching at pipeline join points? The
trouble is that at the moment, because content aggregation is
still on the global TODO, we have no concept in the sitemap
of one pipeline joining another... Thoughts?


Paul

-- 
Paul Russell                               <paul@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Mime
View raw message