jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@day.com>
Subject Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
Date Thu, 24 Jul 2008 13:25:44 GMT
hi shaun

On Thu, Jul 24, 2008 at 1:26 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Hi Stefan et al,
>
> "DefaultISMLocking is used by SISM, i.e. at the bottom layer. SISM maintains
> a workspace-global cache of ItemState instances read from the persistence
> layer. this cache is not affected by session lifetime since it's shared
> among all sessions."
>
> OK that makes sense.
> To summarise what we're seeing, potential bottlenecks we think we're seeing
> and how we worked around them. Please note I'm not 100% familiar with the
> JackRabbit design so some conclusions may be wrong:
>
>  1) application uses Session to read a Node Property
>  2) SessionImpl delegates to ItemManager
>  3) ItemManager synch on a itemCache (Contention Point 1: Session Wide)
>  4) On cache miss, ItemManager ultimately delegates to an SISM
>  5) SISM synchs on ISMLocking (Contention Point 2: Global or per item
> depending on DefaultISM or FineGrainedISM implementation)
>  6) On cache miss, SISM delegates to persistence manager
>  7) AbstractBundlePersistenceManager synchs on itself (Contention Point 3:
> On persistence Manager)
>
> In some cases our web application will read 2,000 or 3,000 Node properties
> to deliver a single page request.
>
> Initially we saw 7) as a bottleneck:
>  - can JackRabbit leverage multiple database connections if its synched on a
> single persistence manager?

no. the PM would need to be adapted/rewritten in order to benefit from
multiple db connections.

>  - we resolved this by configuring a large BundleCache
>
> We then saw 5) as a bottleneck:
>  - it seems as each node property is an item every property read contends on
> ISMLocking. Is that correct? Is there scope for reading properties/lazy
> loading in bulk for item?

that's what the bundle pm should actually be doing...

>  - we partly resolved this by moving from an "pooled session per view"
> pattern to a "shared session per view" pattern
>
> We now see contention occasionally on 3).

please note that a JCR session is not thread safe and should therefore not
be shared among mutliple threads.

if you're experiencing lock contention on ItemManager.itemCache you're
obviously do share sessions...

>
> It feels like there is scope for improving the concurrency in a few places -
> plus consolidate the caching configuration which is currently different for
> BundleCache vs SISM etc.

absolutely agreed, and thanks for your feedback/analysis. that's very
much appreciated.

cheers
stefan

>
>
>
> Regards,
> Shaun
>
>
>
>
> -----Original Message-----
> From: stefan.guggisberg@gmail.com [mailto:stefan.guggisberg@gmail.com] On
> Behalf Of Stefan Guggisberg
> Sent: 21 July 2008 11:04
> To: users@jackrabbit.apache.org
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
>
> hi shaun
>
> On Sun, Jul 20, 2008 at 2:13 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
>> Hi Stefan,
>> So the intention is that once the session is no longer used then the
>> ItemImpl instances are cleared up?
>
> yes, unless ItemImpl instances are still being externally refertenced
> by client code.
>
>> That makes sense except that when
>> investigating the lock contention issues we found that the creation of
>> ItemImpl can become expensive as they queue up on DefaultISMLocking.
>
> i don't think so. i guess there's a misunderstanding and you're confusing
> ItemImpl and ItemState instances.
>
> let me try to clear things up.
>
> ItemImpl (i.e. NodeImpl and PropertyImpl) instances implement the JCR
> interfaces javax.jcr.Node and javax.jcr.Property. they're dealt with at
> the top-most layer in jackrabbit and they're managed by
> o.a.j.core.ItemManager. there's one ItemManager per session.
> ItemImpl instance creation per se should never be expensive since they
> only encapsulate/wrap an Itemstate instance.
>
> ItemState instances OTOH represent the core 'data' of a node/property.
> they're managed on 3 separate layers:
>  - transient (session local, SessionItemStateManager SISM)
>  - local (tx local, LocalItemStateManager LISM)
>  - shared (global, SharedItemStateManager SISM)
>
> DefaultISMLocking is used by SISM, i.e. at the bottom layer.
> SISM maintains a workspace-global cache of ItemState instances
> read from the persistence layer. this cache is not affected
> by session lifetime since it's shared among all sessions.
>
> cheers
> stefan
>
>>
>> When relying on sessions to cache some item data (with a shared session
> per
>> request model) via the ItemManager we found that this significantly
> reduced
>> contention as clients using sessions with some ItemImpls didn't hit
>> DefaultISMLocking. By choosing a suitable X request per 1 session ratio we
>> could spread the locking to increase throughput.
>>
>> With a pooled session per view model (where each request exclusively has
>> access to one session) we found no benefit from the ItemManger cache due
> to
>> the Weak Referenced data being cleared up after each request.
>>
>> Are the LocalItemStateManager and SharedItemStateManager intended to help
>> reduce the load on DefaultISMLocking?
>>
>> Regards,
>> Shaun
>>
>>
>>
>> -----Original Message-----
>> From: stefan.guggisberg@gmail.com [mailto:stefan.guggisberg@gmail.com] On
>> Behalf Of Stefan Guggisberg
>> Sent: 16 July 2008 13:25
>> To: users@jackrabbit.apache.org
>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
> CacheManager
>>
>> hi sean
>>
>> On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
>>> Hi Marcel et al,
>>> 3 suggestions come to mind from this (perhaps for the develop list):
>>>
>>> 1) the ItemManager should be using Soft References rather than Weak
>>> References otherwise a PooledSessionInView pattern is not really
> effective
>>> as, pooled (but unused) sessions have their caches cleared immediately by
>>> the GC (using weak references).
>>
>> ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
>> no more than 1 ItemImpl instance per item id and session. weak references
>> are ideal for this task. ItemManager is not meant to be a 'cache'
>> since ItemImpl
>> instance creation is IMO not performance critical. i remember that i once
>> experimented with soft references but they tended to fill the heap pretty
>> fast
>> since soft references are typically cleared only when you're near an
>> OOM error...
>>
>> ItemState caches are a different matter. LocalItemStateManager and
>> SharedItemStateManager do cache ItemState instances for performance
>> reasons. please take a look at the javadoc which should explain
>> why they're using weak references internally instead of soft references:
>>
>>
> http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt
>> ateReferenceCache.html
>>
>> cheers
>> stefan
>>
>>>
>>> 2) the CacheManager config needs to be externalised so it can be changed
>>> within the XML config, not programmatically.
>>>
>>> 3) its worth considering using a caching library (e.g. ehcahe) for the
>>> BundleCache at least? As a case study we've got multi-GB of binaries in
>>> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
>> after
>>> each restart filling /tmp. It would be great to use a caching library
>> which
>>> supported a persistent cache etc. Obviously externalBlobs helps here.
>>>
>>> Regards,
>>> Shaun
>>>
>>> -----Original Message-----
>>> From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
>>> Sent: 01 July 2008 09:47
>>> To: users@jackrabbit.apache.org
>>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
>> CacheManager
>>>
>>> Hi,
>>>
>>> sbarriba wrote:
>>>> ..        PersistenceManager Cache:
>>>>
>>>> o   The "bundleCacheSize" determines how many nodes the
>> PersistenceManager
>>>> will cache. As this determines the lifetime of the references to the
>>>> temporary BLOB cache if its not large enough BLOBs will be continually
>>> read
>>>> from the database (if using externalBlobs=false).
>>>>
>>>> o   Configurable in <PersistenceManager> XML block
>>>>
>>>> o   Default size 8MB
>>>>
>>>> o   This cache is shared by all sessions.
>>>>
>>>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>>>> FineGrained
>>>
>>> correct, but there's additional synchronization in the persistence
> manager
>>> using
>>> conventional synchronized methods. e.g. see
>>> AbstractBundlePersistenceManager.load(NodeId)
>>>
>>>> ..        Session ItemManager Cache:
>>>>
>>>> o   Items are cached from the underlying persistence manager on a per
>>>> session basis.
>>>>
>>>> o   Limit cannot be set.
>>>
>>> not sure, but I think this cache is also managed (at least partially) by
>> the
>>>
>>> CacheManager.
>>>
>>>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>>>
>>> that's the 'other part' that manages the cache ;)
>>>
>>> items that are still referenced in the application will force the
>> reference
>>> map
>>> to keep the respective ItemState instances (using weak references).
>>>
>>>> o   Synchronised access using the itemCache object
>>>>
>>>> ..        CacheManager Cache:
>>>>
>>>> o   Limit can only be set programmatically via the Workspace
> cacheManager
>>>>
>>>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>>>
>>>> o   Defaults to 16MB
>>>>
>>>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>>>> ItemManager cache
>>>
>>> this only happens indirectly. see above.
>>>
>>>> 2 questions:
>>>>
>>>> ..        What is the purpose of the CacheManager and which caches does
>> it
>>>> actually control?
>>>
>>> It controls *all* the caches that contain ItemState instances.
>>>
>>>> ..        For example, for a workspace with 100,000 nodes what is an
>>>> appropriate setting for the Cache Manager?
>>>
>>> I guess that depends on your JVM heap settings and the usage pattern. if
>> you
>>>
>>> have a lot of random reads over nearly all 100k nodes and performance is
>>> critical you may consider caching all of them. have a look a
>>> ItemState.calculateMemoryFootprint() for a formula on how the memory
>>> consumption
>>> is calculated.
>>>
>>> regards
>>>  marcel
>>>
>>>
>>>
>>
>>
>>
>
>
>

Mime
View raw message