cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: StoreJanitor
Date Wed, 04 Apr 2007 08:07:50 GMT

> 
> AFAICS there are two freeing algorithms in trunk: round-robin 
> and all-stores.

I already thought it would be something like this

</snip>

> and this is IMO one of the major weakenesses of ehcache (or I 
> missed it 
> completely), I did not find any way to limit the number of 
> disk store entries.
> 
> Actually we don't configure this value. According to 
> http://ehcache.sourceforge.net/documentation/configuration.htm
> l the default 
> value is 0 meaning unlimited. We should use the 1.2.4 
> constructor that allows to 
> set a maxElementsOnDisk parameter.

That is added lately to ehcache right? I never saw this one, but it is 
extremely important to set it to a sensible value in my opinion. Cocoon
uses some quite ingenious caching tricks, but the everage user won't be 
aware of the millions of cache entries you can leave behind (like when putting
a timestamp in a cachekey). 

> 
> I wonder what StoreJanitor is good for at all. EHCache takes 
> care that the 
> number of items in the memory cache doesn't grow indefinitly 
> and starts its own 
> cleanup threads for the disc store 
> (http://ehcache.sourceforge.net/documentation/storage_options.
> html#DiskStore). 
> JCS will probably do the same. 

Yes, this is exactly my point. The extra problem is that the StoreJanitor never has
access to the eviction policy of the cache, and just starts throwing out entries "at random".
>From my experience, is that my app will only run solid, when the StoreJanitor never runs
:-) 
Therefor, I have created a few store size options to choose from, matching different
JVM memory sizes. Then, when app is "finished" I start crawling the site (xenu [1]) for an
hour 
and look at status generator mem useage or yourkit profiler or something. If I see the 
nice shaped sawtooth (is this only dutch? :-) ) of memory useage, the stores are configured
correctly 


> I guess that original purpose 
> of StoreJanitor was 
> when Cocoon had its own store implementations (transient, 
> persistent) and we had 
> to take care of cleaning them up in our code.

That must indeed have been the reason (I did not know this one, before my time, so I have

never understood how the StoreJanitor would ever help me out)

> Only the persistent store can grow unlimited but since it 
> should only be used 
> for special usecases, it shouldn't be a real problem.
> 

</snip>

> 
> 
> What do we want to do in order to improve the situation? 
> After reading your mail 
> and from my own experience I'd say
> 
>   - introduce a maxPersistentObjects parameter and use it in 
> EHDefaultCache to set maxElementsOnDisk

+1 

>   - make the registration of stores at StoreJanitor configureable
>     (Though I wonder what the default value should be, true or false?)

0 : I would avoid the StoreJanitor to run anyway

>   - fix EventRegistry

+1: I have fixed this locally to let it work also when cache entries are removed by the internals
of the cache
I did this, by instead of using the AbstractDoubleMapEventRegistry use WeakReferences, so
that when the cache keys
aren't present anymore, the JVM itself cleans the Registry. Two problems:

1: I removed the persistent cache between JVM retarts, but could rebuild this (at the cost
of long start up times though)
2: With former versions of EHCache, my weakreferences where not honoured when cache entries
where overflowed to disk.
Therefor, I thought EHCache might be doing something with the cachekey when moving to the
disk cachekey map. I could only see this behavior in combination with Cocoon, and not when
I tested EHCache seperatly. 
On the EHCache userlist, Greg told me that it was not possible, and also showed it. 
I am using now JCSCache, which I am pretty ok with (only hard configuration)

If by the way, we start fixing the others, like setting a maxdiskobjects, the OOM due to event
registry will increase. 
This is a problem from MultiHashMap (also the not deprecated replacer) that when you do:
map.put("1","test");
map.put("1","test");

you have two values for key "1". 


> 
> Any further ideas?

Hmmm, yes, but I am not sure wether others like it: I think, it might be good, that
when the StoreJanitor runs, there should be at least an info (error level...? I frequently
want to 
give info in messages which is so important, that it must be at error level to not be missed,
but this
is stupid, right?) message about possible problems:

either:
1) your JVM memory settings are too low
2) your stores are configured to have too many memory items
3) your cached objects are very large
4) you have a memory leak in some custom component (a little vague yes :-) )
....etc
Try runnning a crawler (xenu) and watch your status page memory useage.

Another improvement might be trying to avoid binary readers putting entries in memory cache.
But, this might 
be to complex for the average user. In principal, I have have been bugging everybody here
to:

1) use readers in *noncaching* pipelines, and use appropriate expires times in the readers,
very important
for fast pages because browsers  honour the expires time
2) we also read binaries from our repository: these obviously need to be cached, but what
if it are mp3 files
of 15 Mb a piece? Storing this in a normal store...so, I added a protocol, cached-binary:
which in our
setup uses a different store which is configured to have no memory part, only disk cache.


Then again, perhaps the thing above isn't something we can code (except for changing some
things regarding having multiple event registries), 
but...perhaps I should wikify it for the advanced useage? It is though quite some stuff.

Sometimes people have complained to me that
1) cocoon caching is difficult
2) why nobody explained before how cachekeys work, the status generator cachekey overview,

how validities work, etc etc

But, I doubt if there are frameworks around where you get so much ingenious caching for free,
where 95% of the users never have to know about it. And, indeed, when you want to run sites
with > 100.000 pages, you indeed need to know more about it. I do think that is normal.


I think it is brilliant of cocoon that we run sites of 100.000 pages with many users and editors,
which never go down and run everything live with eventcache, and have response times when
cached of within
32 ms (and my latest setups (a skeleton generator with standard conf and sitemaps even go
to 0-15 ms)). 
I did not get this for free. It took me around 3 months to have everything configured/rebuild/added
and understood correctly.
I am not sure about the best way to have it for free for everybody, without needing to understand
it all 
(or at least get proper info about it).
 
WDOT?

Ard

> 
> 
> P.S. Ard, answering to your mails is very difficult because 
> there are no line 
> breaks. Is anybody else experiencing the same problem or is 
> it only me?

I am now for the moment putting in line breaks by enter, but probably doesn't make it any
better, is it? 
Sry if yes, I will try to start using Thunderbird if still a problem

Ard

[1] http://home.snafu.de/tilman/xenulink.html

> 
> -- 
> Reinhard Pötz           Independent Consultant, Trainer & (IT)-Coach 
> 
> {Software Engineering, Open Source, Web Applications, Apache Cocoon}
> 
>                                         web(log): http://www.poetz.cc
> --------------------------------------------------------------------
> 

Mime
View raw message