jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: [Document Cache Size] Is it better to have cache size using number of entries
Date Wed, 20 Aug 2014 08:28:30 GMT
Hi,

If we need a limit on the number of entries for some other (internal)
reason, like consistency check, then I understand. If we later find a way
to speed up consistency check (or if we don't need it, which I would
prefer), then this is no longer needed. But I also don't know how to limit
by number of entries and memory using the Guava cache API.

> why is 256MB -- the default value -- sufficient/insufficient

We don't know. But how do you know that a cache of 10'000 "entries" is
sufficient? Specially if each entry can be either 1 KB or 1 MB or 20 MB.
The available memory can be divided into different areas, and each
component is given a part of that. Then you look at performance, and see
which component is slow, and you try to find out why. For example, it also
depends on how expensive a cache miss is.

As for the cache size in amout of memory: the best way to know what a good
number is, is to analyze the performance (how much time is spent reading,
cache hit ratio,...)

> what should the course of action when seeing a lot of cache misses: (a)
>notify application team, or (b) increase cache size.

It depends on the reason for the cache misses. There could be a loop over
many nodes somewhere, in which case a larger cache might not really help
(most caches are not scan resistant). There could be other reasons. But I
don't see how the ability to configure the number of entries in the cache
would help.

Regards,
Thomas








On 19/08/14 16:25, "Vikas Saurabh" <vikas.saurabh@gmail.com> wrote:

>>> sysadmin can be provided with a rough idea about relation of
>>>(frequently
>>>used) repo nodes using which sysadmin can update cache size.
>>
>> I can't follow you, sorry. How would a sysadmin possibly know the number
>> of frequently used nodes? And why would he know that, and not the amount
>> of memory? And why wouldn't he worry about running into out of memory?
>>
>> Even for off-heap caches, I think it's still important to limit the
>> memory. Even tought you don't get an out-of-memory exception, you would
>> still run out of physical memory, at which point the system would get
>> extremely slow (virtual memory trashing).
>
>What I meant was there was no way for me to guess a good number for
>document cache (e.g. why is 256MB -- the default value --
>sufficient/insufficient) given that I knew what type of load I (as
>application engineer) plan to put on an author. I understand that mem
>usage is the bottom line and sysadmin must configure that too -- but
>from a sysadmin point of view what should the course of action when
>seeing a lot of cache misses: (a) notify application team, or (b)
>increase cache size. Yes, at the end of the day there would be balance
>between these 2 options -- but from app engineer point of view, I've
>no idea what/how much cache size is useful/sufficient or even how to
>map a given size in bytes to the kind of access I'd plan on this
>repository which kind of nullifies option (a). I don't know, for sure,
>about general deployments, but in our case engineer team does
>recommend heap size and other JVM settings (and possibly tweak levels)
>to sysadmin team -- I thought that's how setups usually are done.
>
>Thanks,
>Vikas


Mime
View raw message