cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: StoreJanitor (was: Re: Moving reduced version of CachingSource to core | Configuration issues)
Date Tue, 03 Apr 2007 11:12:27 GMT
Hello,
> 
> Ard Schrijvers wrote:
> > i would be glad to share the code and my ideas, for example 
> about this whole 
> StoreJanitor idea :-)  )
> 
> Just curious, what did you mean by "this whole StoreJanitor idea"?

Before I say things that are wrong, please consider that the StoreJanitor was invented long
before I looked into the cocoon code, so probably a lot of discussion and good ideas has been
around which I am not aware of. But still, my ideas about the StoreJanitor (and sorry for
the long mail, but perhaps it might contain something useful):

1) How it works and its intention (I think :-) ):  The StoreJanitor is originally invented
to monitor cocoon's memory useage and does this by checking some memory values every X (default
10) seconds. Beside the fact that I doubt users know that it is quite important to configure
the store janitor correctly, I stick to the defaults and use a heapsize of just a little lower
then JVM maxmemory. 

Now, every 10 seconds, the StoreJanitor does a check wether (getJVM().totalMemory() >=
getMaxHeapSize() && (getJVM().freeMemory() < getMinFreeMemory()) is true, and if
so, the next store is choosen (compared to previoud one) and entries are removed from this
store (I saw a post that in trunk not one single store is chosen anymore, but an equal part
of all of them is being removed, right?...probably you can configure which stores to use,
i don't know)

2) My Observations: When running high traffic sites and render them live (only mod_cache in
between which holds pages for 5 to 10 min) like [1] or [2], then checking every X sec for
a JVM to be low on memory doesn't make sense to me. At the moment of checking, the JVM might
be perfectly sound but just needed some extra memory for a moment, in that case, the Store
Janitor is removing items from cache while not needed. Also, when the JVM is really in trouble,
but the Store Janitor is not checking for 5 more sec....this might be too long for a JVM in
a high traffic site when it is low on memory. Problems that result from it are:

- Since there is no way to remove cache entries from the used cache impl by the cache's eviction
policy, the cache entries from memory are removed by starting from entry 0, whatever this
might be in the cache. There is a very likely situation, that at the very next request, the
same cache entries are added again.

- Ones the JVM gets low on memory, and the StoreJanitor is needed, it is quite likely that
from that moment on, the StoreJanitor runs *every* 10 seconds, and keeps removing cache entries
which you perhaps don't want to be removed, like compiled stylesheets. 
	1) suppose, from one store (or since trunk from multiple stores) 10% (default) is removed.
This 10% is from the number of memory cache entries. I quite frequently happen to have only
200 entries in memory for each store ( I have added *many* different stores to enable all
we wanted in a high traffic environment) and the rest is disk store. Now, suppose, the JVM
which has 512 Mb of memory, is low on memory, and removes 10% of 200 entries = 20 entries,
helping me zero! These memory entries are my most important ones, so, on the next request,
they are either added again, or, from diskcache I have a hit, implying that the cache will
put this cache entry in memory again. If I would use 2000 memory items, I am very sure, the
200 items which are cleaned are put back in memory before the next StoreJanitor runs.
	2) I am not sure if in trunk you can configure wether the StoreJanitor should leave one store
alone, like the DefaultTransientStore. In this store, typically, compiled stylesheets end
up, and i18n resource bundles. Since these files are needed virtually on every request, I
had rather not that the StoreJanitor removes from this store. I think, the StoreJanitor does
so, leaving my "critical app" in an even worse state, and on the next request, the hardly
improved JVM needs to recompile stylesheets and i18n resource bundles.
	3) What if the JVM being low is not because of the stores....For example, you have added
some component which has some problems you did not know, and, that component is the real reason
for you OOM. The StoreJanitor, sees your low memory, and starts removing entries from your
perfectly sound cache, leaving you app in a much worse situation then it already was. Your
component with memory leak has some more memory it now can fill, and hapily does this, making
the StoreJanitor remove more and more entries from cache, untill it ends up with an empty
cache. You could blame the wrong component for this behavior. One of these wrong components
in use is the event registry for event caching, which made our high traffic sites with 512
Mb crash every two days. Better that I write in another mail what I did to the event cache
registry, why I did not yet post about it, and if others are interested and how to include
it in the trunk. Bottom line is that there was a major OOM problem if the registry grows,
resulting in a StoreJanitor removing cache entries while this was actually increasing the
problem.
	4) By default, probably most people are using ehcache. Naturally, overflow-to-disk is true.
In a high traffic site, the number of cache keys can grow enormously (I have seen mails around
people complaing about disk cached growing to multiple Gbytes). Certainly, when the not very
experienced user uses something like a session attr (or timestamp and many more possibilities)
in a stylesheet parameter which ends up in the cache key (but perhaps, should cocoon be the
target for high traffic sites for the average user, I don't know). Now, and this is IMO one
of the major weakenesses of ehcache (or I missed it completely), I did not find any way to
limit the number of disk store entries. This implies, that the disk store can grow indefinitely.
For the ones ever looking at the status page, cache keys in memory of about 2 kb are quite
common in cocoon (actually, the dept of the folder structure of your app is of influence).
The disk store cache keys are kept in *memory*. So, suppose, you run your app with 128 Mb,
and you have overflow-to-disk=true, your app runs into problem when there are about 50.000
keys in cache. Then your StoreJanitor keep removing entries from your memory cache, which
are refilled with disk store entries just a few moments later. Now, if you really know how
to configure your stores, you use a time2liveSeconds and time2IdleSeconds to let your store
clear unused cache entries. This is good to do, unless, you depend on something like an event
registry which is currently in cocoon trunk. The problem is, that the StoreJanitor removes
cache entries by calling the free from the correct store, which, might for example be the
eventaware store. This event aware store, updates (cleans) its registry before removing the
cache entry from its delegate. Now, when you use the internal cleaning of caches by a time2liveSeconds
or time2IdleSeconds, the event registry is not cleaned and will lead to OOM in the long run.


I have more things about it, but probably nobody will read it anymore, but in short, my conclusion
is that the StoreJanitor never helped me out, but merely impoverished my app when it ran

						--------o0o--------

The rules I try to follow to avoid the Store Janitor to run

1) use readers in noncaching pipelines and use expires on them to avoid cache/memory polution
2) use a different store for repository binary sources which has only a disk store part and
no memory part (cached-binary: protocol added)
3) use a different store for repository sources then for pipeline cache
4) replaced the abstract double mapping event registry to use weakreferences and let the JVM
clean up my event registry
5)  (4) gave me undesired behavior by removing weakrefs in combination with ehcache when overflowing
items to disk (i could not reproduce this, but seems that my references to cachekeys got lost).
Testing with JCSCache solved this problem, gave me faster response times and gave me for free
to limit the number of disk cache entries. Disadvantage of the weakreferences, is that I disabled
persitstent caches for jvm restarts, but, I never wanted this anyway (but this might be implemented
quite easily, but might take long start up times)
6) JCSCache has a complex configuration IMO. Therefor, I added default configurations to choose
from, for example:




[1] http://www.minfin.nl
[2] http://www.minbuza.nl

Mime
View raw message