jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: [Document Cache Size] Is it better to have cache size using number of entries
Date Tue, 19 Aug 2014 12:43:19 GMT
Hi,

Limiting the cache size by number of entries doesn't make sense. It is a
sure way to run into out of memory, exactly because the sizes of documents
varies a lot.

> as you mentioned limit by count is more deterministic.

How, or in what way, is it more deterministic?

> sysadmin can be provided with a rough idea about relation of (frequently
>used) repo nodes using which sysadmin can update cache size.

I can't follow you, sorry. How would a sysadmin possibly know the number
of frequently used nodes? And why would he know that, and not the amount
of memory? And why wouldn't he worry about running into out of memory?

Even for off-heap caches, I think it's still important to limit the
memory. Even tought you don't get an out-of-memory exception, you would
still run out of physical memory, at which point the system would get
extremely slow (virtual memory trashing).

Regards,
Thomas



On 19/08/14 08:30, "Chetan Mehrotra" <chetan.mehrotra@gmail.com> wrote:

>Hi Vikas,
>
>Sizing the cache can be done by either number of entries or the size
>taken by cache. Currently in Oak we limit by size however as you
>mentioned limit by count is more deterministic. We use Guava Cache and
>it supports either limiting by size or by number of entries i.e. the
>two policies are exclusive.
>
>So at minimum if you can provide a patch which allows the admin to
>choose between the two it would allow us to experiment and later see
>how we can put a max cap on cache size.
>Chetan Mehrotra
>
>
>On Mon, Aug 18, 2014 at 7:55 PM, Vikas Saurabh <vikas.saurabh@gmail.com>
>wrote:
>>>> we can probably have both and cache respects whichever constraint hits
>>>> first (sort of min(byte size, entry size)).
>>> First of all I don't know MongoNS implementation details so I can be
>>>wrong.
>>>
>>> I'd rather keep the size in bytes as it gives me much more control over
>>> the memory I have and what I decide to provide to the application. If
>>>we
>>> say, to take an extreme example, 1 document only in cache and then this
>>> single document exceed the amount of available memory I fear an OOM. On
>>> the other hand having bytes ensure us the application keeps working and
>>> it will be task of a sysadmin to monitor the eventual hit/miss ratio to
>>> adjust the cache accordingly.
>>>
>> Yes, sysadmin can modify cache size in bytes if miss ratio increases.
>> But, in current scenario, I couldn't figure out a neat way
>> (heuristic/guesswork) to figure
>> out if it's application mis-behavior or lack of cache size (notice our
>> issue didn't happen
>> to be related to cache size... but still the question did bug us). On
>> the other hand, an
>> sysadmin can be provided with a rough idea about relation of
>> (frequently used) repo nodes
>> using which sysadmin can update cache size.
>> Also, I do take the point of avoiding OOMs in case of pretty large
>> documents which is why
>> we can have both properties(byte size and entry count) with byte
>> constraint being a fail safe.
>>
>> Thanks,
>> Vikas


Mime
View raw message