Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 21996 invoked from network); 3 Apr 2007 11:12:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Apr 2007 11:12:54 -0000 Received: (qmail 66981 invoked by uid 500); 3 Apr 2007 11:13:01 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 66542 invoked by uid 500); 3 Apr 2007 11:12:59 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@cocoon.apache.org List-Id: Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 66531 invoked by uid 99); 3 Apr 2007 11:12:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2007 04:12:59 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [213.133.33.40] (HELO smtp.is.nl) (213.133.33.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2007 04:12:50 -0700 Received: from [213.133.51.241] (HELO hai01.hippo.local) by smtp.is.nl (CommuniGate Pro SMTP 5.0.10) with ESMTP id 13001022 for dev@cocoon.apache.org; Tue, 03 Apr 2007 13:12:38 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: StoreJanitor (was: Re: Moving reduced version of CachingSource to core | Configuration issues) Date: Tue, 3 Apr 2007 13:12:27 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: StoreJanitor (was: Re: Moving reduced version of CachingSource to core | Configuration issues) Thread-Index: Acd1CI8EwL9G+mF/TDmVgokR63X61AAyyVGg From: "Ard Schrijvers" To: X-Virus-Checked: Checked by ClamAV on apache.org Hello, >=20 > Ard Schrijvers wrote: > > i would be glad to share the code and my ideas, for example=20 > about this whole=20 > StoreJanitor idea :-) ) >=20 > Just curious, what did you mean by "this whole StoreJanitor idea"? Before I say things that are wrong, please consider that the = StoreJanitor was invented long before I looked into the cocoon code, so = probably a lot of discussion and good ideas has been around which I am = not aware of. But still, my ideas about the StoreJanitor (and sorry for = the long mail, but perhaps it might contain something useful): 1) How it works and its intention (I think :-) ): The StoreJanitor is = originally invented to monitor cocoon's memory useage and does this by = checking some memory values every X (default 10) seconds. Beside the = fact that I doubt users know that it is quite important to configure the = store janitor correctly, I stick to the defaults and use a heapsize of = just a little lower then JVM maxmemory.=20 Now, every 10 seconds, the StoreJanitor does a check wether = (getJVM().totalMemory() >=3D getMaxHeapSize() && (getJVM().freeMemory() = < getMinFreeMemory()) is true, and if so, the next store is choosen = (compared to previoud one) and entries are removed from this store (I = saw a post that in trunk not one single store is chosen anymore, but an = equal part of all of them is being removed, right?...probably you can = configure which stores to use, i don't know) 2) My Observations: When running high traffic sites and render them live = (only mod_cache in between which holds pages for 5 to 10 min) like [1] = or [2], then checking every X sec for a JVM to be low on memory doesn't = make sense to me. At the moment of checking, the JVM might be perfectly = sound but just needed some extra memory for a moment, in that case, the = Store Janitor is removing items from cache while not needed. Also, when = the JVM is really in trouble, but the Store Janitor is not checking for = 5 more sec....this might be too long for a JVM in a high traffic site = when it is low on memory. Problems that result from it are: - Since there is no way to remove cache entries from the used cache impl = by the cache's eviction policy, the cache entries from memory are = removed by starting from entry 0, whatever this might be in the cache. = There is a very likely situation, that at the very next request, the = same cache entries are added again. - Ones the JVM gets low on memory, and the StoreJanitor is needed, it is = quite likely that from that moment on, the StoreJanitor runs *every* 10 = seconds, and keeps removing cache entries which you perhaps don't want = to be removed, like compiled stylesheets.=20 1) suppose, from one store (or since trunk from multiple stores) 10% = (default) is removed. This 10% is from the number of memory cache = entries. I quite frequently happen to have only 200 entries in memory = for each store ( I have added *many* different stores to enable all we = wanted in a high traffic environment) and the rest is disk store. Now, = suppose, the JVM which has 512 Mb of memory, is low on memory, and = removes 10% of 200 entries =3D 20 entries, helping me zero! These memory = entries are my most important ones, so, on the next request, they are = either added again, or, from diskcache I have a hit, implying that the = cache will put this cache entry in memory again. If I would use 2000 = memory items, I am very sure, the 200 items which are cleaned are put = back in memory before the next StoreJanitor runs. 2) I am not sure if in trunk you can configure wether the StoreJanitor = should leave one store alone, like the DefaultTransientStore. In this = store, typically, compiled stylesheets end up, and i18n resource = bundles. Since these files are needed virtually on every request, I had = rather not that the StoreJanitor removes from this store. I think, the = StoreJanitor does so, leaving my "critical app" in an even worse state, = and on the next request, the hardly improved JVM needs to recompile = stylesheets and i18n resource bundles. 3) What if the JVM being low is not because of the stores....For = example, you have added some component which has some problems you did = not know, and, that component is the real reason for you OOM. The = StoreJanitor, sees your low memory, and starts removing entries from = your perfectly sound cache, leaving you app in a much worse situation = then it already was. Your component with memory leak has some more = memory it now can fill, and hapily does this, making the StoreJanitor = remove more and more entries from cache, untill it ends up with an empty = cache. You could blame the wrong component for this behavior. One of = these wrong components in use is the event registry for event caching, = which made our high traffic sites with 512 Mb crash every two days. = Better that I write in another mail what I did to the event cache = registry, why I did not yet post about it, and if others are interested = and how to include it in the trunk. Bottom line is that there was a = major OOM problem if the registry grows, resulting in a StoreJanitor = removing cache entries while this was actually increasing the problem. 4) By default, probably most people are using ehcache. Naturally, = overflow-to-disk is true. In a high traffic site, the number of cache = keys can grow enormously (I have seen mails around people complaing = about disk cached growing to multiple Gbytes). Certainly, when the not = very experienced user uses something like a session attr (or timestamp = and many more possibilities) in a stylesheet parameter which ends up in = the cache key (but perhaps, should cocoon be the target for high traffic = sites for the average user, I don't know). Now, and this is IMO one of = the major weakenesses of ehcache (or I missed it completely), I did not = find any way to limit the number of disk store entries. This implies, = that the disk store can grow indefinitely. For the ones ever looking at = the status page, cache keys in memory of about 2 kb are quite common in = cocoon (actually, the dept of the folder structure of your app is of = influence). The disk store cache keys are kept in *memory*. So, suppose, = you run your app with 128 Mb, and you have overflow-to-disk=3Dtrue, your = app runs into problem when there are about 50.000 keys in cache. Then = your StoreJanitor keep removing entries from your memory cache, which = are refilled with disk store entries just a few moments later. Now, if = you really know how to configure your stores, you use a time2liveSeconds = and time2IdleSeconds to let your store clear unused cache entries. This = is good to do, unless, you depend on something like an event registry = which is currently in cocoon trunk. The problem is, that the = StoreJanitor removes cache entries by calling the free from the correct = store, which, might for example be the eventaware store. This event = aware store, updates (cleans) its registry before removing the cache = entry from its delegate. Now, when you use the internal cleaning of = caches by a time2liveSeconds or time2IdleSeconds, the event registry is = not cleaned and will lead to OOM in the long run.=20 I have more things about it, but probably nobody will read it anymore, = but in short, my conclusion is that the StoreJanitor never helped me = out, but merely impoverished my app when it ran --------o0o-------- The rules I try to follow to avoid the Store Janitor to run 1) use readers in noncaching pipelines and use expires on them to avoid = cache/memory polution 2) use a different store for repository binary sources which has only a = disk store part and no memory part (cached-binary: protocol added) 3) use a different store for repository sources then for pipeline cache 4) replaced the abstract double mapping event registry to use = weakreferences and let the JVM clean up my event registry 5) (4) gave me undesired behavior by removing weakrefs in combination = with ehcache when overflowing items to disk (i could not reproduce this, = but seems that my references to cachekeys got lost). Testing with = JCSCache solved this problem, gave me faster response times and gave me = for free to limit the number of disk cache entries. Disadvantage of the = weakreferences, is that I disabled persitstent caches for jvm restarts, = but, I never wanted this anyway (but this might be implemented quite = easily, but might take long start up times) 6) JCSCache has a complex configuration IMO. Therefor, I added default = configurations to choose from, for example: [1] http://www.minfin.nl [2] http://www.minbuza.nl