cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Fixing store design (long) (was Re: CocoonForms server sizing?)
Date Fri, 05 Dec 2003 22:38:29 GMT
Geoff Howard wrote:

> Bruno Dumon wrote:
>
>> On Wed, 2003-12-03 at 12:24, Joerg Heinicke wrote:
>>
>>> On 03.12.2003 10:01, Leszek Gawron wrote:
>>
> ....
>
>> There's also another problem: the store used to cache the stylesheets 
>> apparently tries to serialize the cached items to disk, as reported 
>> here:
>> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=106969948018306&w=2
>>
>> just had a quick look into it, it seems that if we would set the 
>> use-persistent-cache option of the transient store to false this 
>> should be fixed. I don't know if this would be acceptable though 
>> (depends on which other components put items in that store).
>
>
> Unless I'm misunderstanding you, the cacheable pipelines put cached 
> results there not only on container shutdown, but after they are 
> bumped off the bottom of the MRU stack in memory.  Disabling this 
> feature by default would be bad I think.


This subject comes again regularly, and I would like to solve definitively.

Cocoon currently has two stores:
- a transient store: <transient-store> in cocoon.xconf, 
Store.TRANSIENT_STORE in Java code
- a persistent store: <persistent-store>, Store.PERSISTANT_STORE which 
equals Store.ROLE (more on this below)

The transient-store, as its name implies, should be transient, and used 
to cache objects that should not be serialized, either because they 
cannot (the case of Xalan Transformers) or because it doesn't make sense 
(e.g. some app-related data that needs to be refreshed on startup).

The persistent store should be used for objects that can be either 
long-lived or whose size justify their storage on disk. This includes of 
course the CachingPipeline results.

                          --- oOo ---

This is the theory, now let's take a closer look at our actual setup:

Transient-store
---------------
What is this "use-persistent-cache" parameter on a *transient* cache? 
Let's look at the implementation, which is 
org.apache.excalibur.store.impl.MRUMemoryStore.

This parameter allows the MRUMemoryStore to cooperate with another store 
(role Store.PERSISTANT_STORE), where data will be flushed when the store 
is requested to free some memory or when it is disposed. This is a nice 
feature that allows to have a two-stage cache (memory + disk) like in 
web browsers, which keeps commonly accessed data in memory (very fast 
access) and swaps less used data to disk (slower than memory, but way 
faster than regeneration).

But this feature *is not suitable* for a transient store!!! The 
transient should be a memory store with no persistant back-end!!!

But as we'll see below, if we make it really transient, we kill the 
pipeline cache...

Note also that only objects instanceof Serializable are stored in the 
persistent store, but having the top-level object of an object graph 
implement Serializable is no guarantee that the whole graph is 
Serializable. This leads to a lot of errors as reported in the post 
linked above.


Persistant-store
----------------
The implementation for persistant-store is 
org.apache.cocoon.components.store.impl.DefaultStore which extends 
AbstractJISPFileSystemStore. This implementation stores data directly on 
disk without a fast in-memory front-end.

So the persistant store is really persistant (which is good), but is slow.


Pipeline caching
----------------
The CachingPipeline uses a Cache component to load/save cached 
responses. The only implementation of Cache, CacheImpl, uses a store 
which is... Store.TRANSIENT_STORE!!!

Why so? Well, in the buggy setup we have, the way to use an efficient 
two-stage store (memory + filesystem) is by using the transient cache. 
So if we make the transient-store really transient, we have no more 
on-disk cache. Weird, no?


Transient-cache's "maxobjects"
------------------------------
The transient cache has a "maxobjects" of 100, meaning that at most 100 
non-serializable objects will be kept in memory. This is obviously too 
low, furthermore considering that pipeline content also goes in this 
cache, and that a Cocoon pre-analyses lots of things (stylesheets, 
jxtemplates, XSP logicsheets, woody form definitions) that would benefit 
of being kept longer in memory.

And what's the point of having a store-janitor that is supposed to flush 
the stores when memory is low if there is such a low hard limit?


Private caches all over the place
---------------------------------
I mentioned above components that pre-analyze files like jxtemplate, 
woody form definitions, flowscript, etc. Now if we look closer at these 
components, we see that each of them has its own private cache (often a 
static Map). This means that every loaded file is kept in memory 
forever, even if only used once in the system lifetime, and even if the 
corresponding file is actually deleted!


                          --- oOo ---

Obviously, there's a big problem in store-land! So here are the 
solutions I propose to solve all these issues:

Clarify the store semantics
---------------------------
As we've seen below, the Store interface provides 3 roles: Store.ROLE, 
Store.TRANSIENT_STORE and Store.PERSISTANT_STORE. But the problem is 
that PERSISTANT is defined as equal to ROLE and we actually only have 
two real roles.

I propose to clearly distinguish the 3 roles and the associtated semantics:
- Store.ROLE is the "general-purpose" store. A component that doesn't 
care if the cache is transient or persistent should use this one. Being 
general-purpose, it should be efficient but also swap old objects to 
persistent storage.

- Store.TRANSIENT_STORE should be used to keep objects that aren't 
serializable but should be kept in memory as far as possible. The flush 
strategy of this store should not be mixed with a limited-size MRU 
policy of a persistent store front-end.

- Store.PERSISTANT_STORE should be, as its name implies, only 
persistant, with no memory front-end or whatsoever.


Redefine Cocoon stores
----------------------
With the above definitions, here is how the 3 stores should be 
configured in Cocoon:

<transient-store> should be a MRUMemoryStore with 
use-persistent-store=false (the default) and no maxobjects limit. It 
will be flushed as needed by the store janitor.

<store> should be a MRUMemoryStore with use-persistent-store=true, and a 
fixed maxobjects (can be tuned according to the physical memory). It 
therefore becomes a two-stage cache with a limited number of objects in 
memory.

<persistent-store> can be the current Jisp-based implementation.

Note that it's very unlikely that some component other than <store> will 
directly use <transient-store> (direct read/write to disk without a 
memory front-end isn't efficient). So we may want to write a new class 
that combines MRU+Jisp to remove a <persistent-store> from the xconf file.


Review store usage in the code
------------------------------
Once we have a clean cache setup, we can review the code and use the 
stores according to their respective semantics.

The pipeline cache, and all other storage of Serializable objects should 
go into Store.ROLE.

Stylesheets, flowscripts, jxtemplates, woody form defs, etc, should no 
more have a private cache, but should use Store.TRANSIENT_STORE.


                          --- oOo ---

Conclusion
----------
The proposed changes want to clarify the respective roles of the various 
stores, and make them behave as they should according to their names 
(e.g. transient is really transient). This should allow us to better 
understand what's going on in the system, and optimize memory usage for 
a better scalability.

So, what do you think?

Sylvain

[1] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104705319323741&w=2

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Mime
View raw message