Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cocoon.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: StoreJanitor (was: Re: Moving reduced version of CachingSource to
 core | Configuration issues)
Date: Tue, 3 Apr 2007 13:12:27 +0200
Message-ID: <A955EA1F8FE31749AEC8C998082F6C7CD6E914@hai01.hippo.local>
Thread-Topic: StoreJanitor (was: Re: Moving reduced version of CachingSource
 to core | Configuration issues)
Thread-Index: Acd1CI8EwL9G+mF/TDmVgokR63X61AAyyVGg
From: "Ard Schrijvers" <a.schrijvers@hippo.nl>
To: <dev@cocoon.apache.org>

Hello,
>=20
> Ard Schrijvers wrote:
> > i would be glad to share the code and my ideas, for example=20
> about this whole=20
> StoreJanitor idea :-)  )
>=20
> Just curious, what did you mean by "this whole StoreJanitor idea"?

Before I say things that are wrong, please consider that the =
StoreJanitor was invented long before I looked into the cocoon code, so =
probably a lot of discussion and good ideas has been around which I am =
not aware of. But still, my ideas about the StoreJanitor (and sorry for =
the long mail, but perhaps it might contain something useful):

1) How it works and its intention (I think :-) ):  The StoreJanitor is =
originally invented to monitor cocoon's memory useage and does this by =
checking some memory values every X (default 10) seconds. Beside the =
fact that I doubt users know that it is quite important to configure the =
store janitor correctly, I stick to the defaults and use a heapsize of =
just a little lower then JVM maxmemory.=20

Now, every 10 seconds, the StoreJanitor does a check wether =
(getJVM().totalMemory() >=3D getMaxHeapSize() && (getJVM().freeMemory() =
< getMinFreeMemory()) is true, and if so, the next store is choosen =
(compared to previoud one) and entries are removed from this store (I =
saw a post that in trunk not one single store is chosen anymore, but an =
equal part of all of them is being removed, right?...probably you can =
configure which stores to use, i don't know)

2) My Observations: When running high traffic sites and render them live =
(only mod_cache in between which holds pages for 5 to 10 min) like [1] =
or [2], then checking every X sec for a JVM to be low on memory doesn't =
make sense to me. At the moment of checking, the JVM might be perfectly =
sound but just needed some extra memory for a moment, in that case, the =
Store Janitor is removing items from cache while not needed. Also, when =
the JVM is really in trouble, but the Store Janitor is not checking for =
5 more sec....this might be too long for a JVM in a high traffic site =
when it is low on memory. Problems that result from it are:

- Since there is no way to remove cache entries from the used cache impl =
by the cache's eviction policy, the cache entries from memory are =
removed by starting from entry 0, whatever this might be in the cache. =
There is a very likely situation, that at the very next request, the =
same cache entries are added again.

- Ones the JVM gets low on memory, and the StoreJanitor is needed, it is =
quite likely that from that moment on, the StoreJanitor runs *every* 10 =
seconds, and keeps removing cache entries which you perhaps don't want =
to be removed, like compiled stylesheets.=20
	1) suppose, from one store (or since trunk from multiple stores) 10% =
(default) is removed. This 10% is from the number of memory cache =
entries. I quite frequently happen to have only 200 entries in memory =
for each store ( I have added *many* different stores to enable all we =
wanted in a high traffic environment) and the rest is disk store. Now, =
suppose, the JVM which has 512 Mb of memory, is low on memory, and =
removes 10% of 200 entries =3D 20 entries, helping me zero! These memory =
entries are my most important ones, so, on the next request, they are =
either added again, or, from diskcache I have a hit, implying that the =
cache will put this cache entry in memory again. If I would use 2000 =
memory items, I am very sure, the 200 items which are cleaned are put =
back in memory before the next StoreJanitor runs.
	2) I am not sure if in trunk you can configure wether the StoreJanitor =
should leave one store alone, like the DefaultTransientStore. In this =
store, typically, compiled stylesheets end up, and i18n resource =
bundles. Since these files are needed virtually on every request, I had =
rather not that the StoreJanitor removes from this store. I think, the =
StoreJanitor does so, leaving my "critical app" in an even worse state, =
and on the next request, the hardly improved JVM needs to recompile =
stylesheets and i18n resource bundles.
	3) What if the JVM being low is not because of the stores....For =
example, you have added some component which has some problems you did =
not know, and, that component is the real reason for you OOM. The =
StoreJanitor, sees your low memory, and starts removing entries from =
your perfectly sound cache, leaving you app in a much worse situation =
then it already was. Your component with memory leak has some more =
memory it now can fill, and hapily does this, making the StoreJanitor =
remove more and more entries from cache, untill it ends up with an empty =
cache. You could blame the wrong component for this behavior. One of =
these wrong components in use is the event registry for event caching, =
which made our high traffic sites with 512 Mb crash every two days. =
Better that I write in another mail what I did to the event cache =
registry, why I did not yet post about it, and if others are interested =
and how to include it in the trunk. Bottom line is that there was a =
major OOM problem if the registry grows, resulting in a StoreJanitor =
removing cache entries while this was actually increasing the problem.
	4) By default, probably most people are using ehcache. Naturally, =
overflow-to-disk is true. In a high traffic site, the number of cache =
keys can grow enormously (I have seen mails around people complaing =
about disk cached growing to multiple Gbytes). Certainly, when the not =
very experienced user uses something like a session attr (or timestamp =
and many more possibilities) in a stylesheet parameter which ends up in =
the cache key (but perhaps, should cocoon be the target for high traffic =
sites for the average user, I don't know). Now, and this is IMO one of =
the major weakenesses of ehcache (or I missed it completely), I did not =
find any way to limit the number of disk store entries. This implies, =
that the disk store can grow indefinitely. For the ones ever looking at =
the status page, cache keys in memory of about 2 kb are quite common in =
cocoon (actually, the dept of the folder structure of your app is of =
influence). The disk store cache keys are kept in *memory*. So, suppose, =
you run your app with 128 Mb, and you have overflow-to-disk=3Dtrue, your =
app runs into problem when there are about 50.000 keys in cache. Then =
your StoreJanitor keep removing entries from your memory cache, which =
are refilled with disk store entries just a few moments later. Now, if =
you really know how to configure your stores, you use a time2liveSeconds =
and time2IdleSeconds to let your store clear unused cache entries. This =
is good to do, unless, you depend on something like an event registry =
which is currently in cocoon trunk. The problem is, that the =
StoreJanitor removes cache entries by calling the free from the correct =
store, which, might for example be the eventaware store. This event =
aware store, updates (cleans) its registry before removing the cache =
entry from its delegate. Now, when you use the internal cleaning of =
caches by a time2liveSeconds or time2IdleSeconds, the event registry is =
not cleaned and will lead to OOM in the long run.=20

I have more things about it, but probably nobody will read it anymore, =
but in short, my conclusion is that the StoreJanitor never helped me =
out, but merely impoverished my app when it ran

						--------o0o--------

The rules I try to follow to avoid the Store Janitor to run

1) use readers in noncaching pipelines and use expires on them to avoid =
cache/memory polution
2) use a different store for repository binary sources which has only a =
disk store part and no memory part (cached-binary: protocol added)
3) use a different store for repository sources then for pipeline cache
4) replaced the abstract double mapping event registry to use =
weakreferences and let the JVM clean up my event registry
5)  (4) gave me undesired behavior by removing weakrefs in combination =
with ehcache when overflowing items to disk (i could not reproduce this, =
but seems that my references to cachekeys got lost). Testing with =
JCSCache solved this problem, gave me faster response times and gave me =
for free to limit the number of disk cache entries. Disadvantage of the =
weakreferences, is that I disabled persitstent caches for jvm restarts, =
but, I never wanted this anyway (but this might be implemented quite =
easily, but might take long start up times)
6) JCSCache has a complex configuration IMO. Therefor, I added default =
configurations to choose from, for example:


[1] http://www.minfin.nl
[2] http://www.minbuza.nl