jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Reduce number of calls from Oak to Mongo DB on restarts
Date Fri, 25 Oct 2013 15:32:17 GMT
Hi,

Trying to restart an application like Adobe CQ running on Oak and
Mongo DB on a remote system takes considerable amount of time. In a
typical restart the number of such calls are around 23000 (Reduced
from 42000 with OAK-1117). I am trying to analyze the nature of calls
and also cache utilization to see if these can be reduced in OAK-1119.

Seeing the various logs following things stand out

* Number of queries made to fetch children are 18000 out of total
23000. Such queries also populate the doc cache hence number of
explicit find queries for individual docs are quite low (~400)
* The utilization of nodeChildrenCache is quite poor
* Number of updates are low still we see cache entries for same path
at diff revisions
* Checking the entries of the nodeCache and nodeChildren cache which
use path@revision key shows that there are quite a few entries at
different revision for same path.

Based on above we can look into following aspect

A - caching strategy for nodeCache and nodeChildrenCache - I think
current logic caches a node at the revision its is asked for and not
at the revision it actually exist. For example if client ask for node
at revision 100 for /foo/bar and actual latest valid revision for
/foo/bar is 70 we still cache it at key /foo/bar@100. If a different
client ask for version 105 and version of /foo/bar is still /70 we
would make a new cache entry at 105

B - Using a persistent cache to manage restarts -
No matter what we do we would still make large number of call on
restart as complete state is managed in a remote system. Most of state
we would read again would not have changed. So It might be better if
we cache the doc cache (L3 cache, L2 might be off heap cache) in a
persistent way say using MapDB [1] or H2 database. if we can found a
valid node at given revision from L3 then we serve it from there.

Upon restart we can check if modCount for document does not change (we
can check for multiple doc using in clause) and then we server it from
L3

C - Maintain an approximate estimate of childCounts in parent
In addition we can make an approx estimate of childCount and eagerly
fetch child node if number is small. We can possible make use of
primaryType of node to make a better guess

Thoughts?

Chetan Mehrotra
[1] http://www.mapdb.org/

Mime
View raw message