jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@day.com>
Subject Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
Date Wed, 30 Jul 2008 11:58:16 GMT
hi shaun

On Wed, Jul 30, 2008 at 1:33 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Hi Stefan,
> RE " please note that a JCR session is not thread safe and should therefore
> not
> be shared among mutliple threads."
>
> ....that's a concern when sharing sessions for read/write but for read-only
> applications I understand its safe to share sessions?

i wouldn't call it "safe" ;) it's true that there's been some effort in the past
to support thread-safe read-only sessions. however, it's only been a
best effort and it was never advertised as an official jackrabbit feature.

there's a reasonably high chance that you'll still encounter concurrency
issues when sharing read-only sessions. we therefore decided to entirely
discourage session sharing.

see http://wiki.apache.org/jackrabbit/JcrSessionHandling

cheers
stefan

>
> Regards,
> Shaun
>
> -----Original Message-----
> From: stefan.guggisberg@gmail.com [mailto:stefan.guggisberg@gmail.com] On
> Behalf Of Stefan Guggisberg
> Sent: 24 July 2008 14:26
> To: users@jackrabbit.apache.org
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
>
> hi shaun
>
> On Thu, Jul 24, 2008 at 1:26 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
>> Hi Stefan et al,
>>
>> "DefaultISMLocking is used by SISM, i.e. at the bottom layer. SISM
> maintains
>> a workspace-global cache of ItemState instances read from the persistence
>> layer. this cache is not affected by session lifetime since it's shared
>> among all sessions."
>>
>> OK that makes sense.
>> To summarise what we're seeing, potential bottlenecks we think we're
> seeing
>> and how we worked around them. Please note I'm not 100% familiar with the
>> JackRabbit design so some conclusions may be wrong:
>>
>>  1) application uses Session to read a Node Property
>>  2) SessionImpl delegates to ItemManager
>>  3) ItemManager synch on a itemCache (Contention Point 1: Session Wide)
>>  4) On cache miss, ItemManager ultimately delegates to an SISM
>>  5) SISM synchs on ISMLocking (Contention Point 2: Global or per item
>> depending on DefaultISM or FineGrainedISM implementation)
>>  6) On cache miss, SISM delegates to persistence manager
>>  7) AbstractBundlePersistenceManager synchs on itself (Contention Point 3:
>> On persistence Manager)
>>
>> In some cases our web application will read 2,000 or 3,000 Node properties
>> to deliver a single page request.
>>
>> Initially we saw 7) as a bottleneck:
>>  - can JackRabbit leverage multiple database connections if its synched on
> a
>> single persistence manager?
>
> no. the PM would need to be adapted/rewritten in order to benefit from
> multiple db connections.
>
>>  - we resolved this by configuring a large BundleCache
>>
>> We then saw 5) as a bottleneck:
>>  - it seems as each node property is an item every property read contends
> on
>> ISMLocking. Is that correct? Is there scope for reading properties/lazy
>> loading in bulk for item?
>
> that's what the bundle pm should actually be doing...
>
>>  - we partly resolved this by moving from an "pooled session per view"
>> pattern to a "shared session per view" pattern
>>
>> We now see contention occasionally on 3).
>
> please note that a JCR session is not thread safe and should therefore not
> be shared among mutliple threads.
>
> if you're experiencing lock contention on ItemManager.itemCache you're
> obviously do share sessions...
>
>>
>> It feels like there is scope for improving the concurrency in a few places
> -
>> plus consolidate the caching configuration which is currently different
> for
>> BundleCache vs SISM etc.
>
> absolutely agreed, and thanks for your feedback/analysis. that's very
> much appreciated.
>
> cheers
> stefan
>
>>
>>
>>
>> Regards,
>> Shaun
>>
>>
>>
>>
>> -----Original Message-----
>> From: stefan.guggisberg@gmail.com [mailto:stefan.guggisberg@gmail.com] On
>> Behalf Of Stefan Guggisberg
>> Sent: 21 July 2008 11:04
>> To: users@jackrabbit.apache.org
>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
> CacheManager
>>
>> hi shaun
>>
>> On Sun, Jul 20, 2008 at 2:13 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
>>> Hi Stefan,
>>> So the intention is that once the session is no longer used then the
>>> ItemImpl instances are cleared up?
>>
>> yes, unless ItemImpl instances are still being externally refertenced
>> by client code.
>>
>>> That makes sense except that when
>>> investigating the lock contention issues we found that the creation of
>>> ItemImpl can become expensive as they queue up on DefaultISMLocking.
>>
>> i don't think so. i guess there's a misunderstanding and you're confusing
>> ItemImpl and ItemState instances.
>>
>> let me try to clear things up.
>>
>> ItemImpl (i.e. NodeImpl and PropertyImpl) instances implement the JCR
>> interfaces javax.jcr.Node and javax.jcr.Property. they're dealt with at
>> the top-most layer in jackrabbit and they're managed by
>> o.a.j.core.ItemManager. there's one ItemManager per session.
>> ItemImpl instance creation per se should never be expensive since they
>> only encapsulate/wrap an Itemstate instance.
>>
>> ItemState instances OTOH represent the core 'data' of a node/property.
>> they're managed on 3 separate layers:
>>  - transient (session local, SessionItemStateManager SISM)
>>  - local (tx local, LocalItemStateManager LISM)
>>  - shared (global, SharedItemStateManager SISM)
>>
>> DefaultISMLocking is used by SISM, i.e. at the bottom layer.
>> SISM maintains a workspace-global cache of ItemState instances
>> read from the persistence layer. this cache is not affected
>> by session lifetime since it's shared among all sessions.
>>
>> cheers
>> stefan
>>
>>>
>>> When relying on sessions to cache some item data (with a shared session
>> per
>>> request model) via the ItemManager we found that this significantly
>> reduced
>>> contention as clients using sessions with some ItemImpls didn't hit
>>> DefaultISMLocking. By choosing a suitable X request per 1 session ratio
> we
>>> could spread the locking to increase throughput.
>>>
>>> With a pooled session per view model (where each request exclusively has
>>> access to one session) we found no benefit from the ItemManger cache due
>> to
>>> the Weak Referenced data being cleared up after each request.
>>>
>>> Are the LocalItemStateManager and SharedItemStateManager intended to help
>>> reduce the load on DefaultISMLocking?
>>>
>>> Regards,
>>> Shaun
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: stefan.guggisberg@gmail.com [mailto:stefan.guggisberg@gmail.com] On
>>> Behalf Of Stefan Guggisberg
>>> Sent: 16 July 2008 13:25
>>> To: users@jackrabbit.apache.org
>>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
>> CacheManager
>>>
>>> hi sean
>>>
>>> On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@yahoo.co.uk> wrote:
>>>> Hi Marcel et al,
>>>> 3 suggestions come to mind from this (perhaps for the develop list):
>>>>
>>>> 1) the ItemManager should be using Soft References rather than Weak
>>>> References otherwise a PooledSessionInView pattern is not really
>> effective
>>>> as, pooled (but unused) sessions have their caches cleared immediately
> by
>>>> the GC (using weak references).
>>>
>>> ItemManager cashes ItemImpl instances. the 'cache' guarantees that
> there's
>>> no more than 1 ItemImpl instance per item id and session. weak references
>>> are ideal for this task. ItemManager is not meant to be a 'cache'
>>> since ItemImpl
>>> instance creation is IMO not performance critical. i remember that i once
>>> experimented with soft references but they tended to fill the heap pretty
>>> fast
>>> since soft references are typically cleared only when you're near an
>>> OOM error...
>>>
>>> ItemState caches are a different matter. LocalItemStateManager and
>>> SharedItemStateManager do cache ItemState instances for performance
>>> reasons. please take a look at the javadoc which should explain
>>> why they're using weak references internally instead of soft references:
>>>
>>>
>>
> http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt
>>> ateReferenceCache.html
>>>
>>> cheers
>>> stefan
>>>
>>>>
>>>> 2) the CacheManager config needs to be externalised so it can be changed
>>>> within the XML config, not programmatically.
>>>>
>>>> 3) its worth considering using a caching library (e.g. ehcahe) for the
>>>> BundleCache at least? As a case study we've got multi-GB of binaries in
>>>> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
>>> after
>>>> each restart filling /tmp. It would be great to use a caching library
>>> which
>>>> supported a persistent cache etc. Obviously externalBlobs helps here.
>>>>
>>>> Regards,
>>>> Shaun
>>>>
>>>> -----Original Message-----
>>>> From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
>>>> Sent: 01 July 2008 09:47
>>>> To: users@jackrabbit.apache.org
>>>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
>>> CacheManager
>>>>
>>>> Hi,
>>>>
>>>> sbarriba wrote:
>>>>> ..        PersistenceManager Cache:
>>>>>
>>>>> o   The "bundleCacheSize" determines how many nodes the
>>> PersistenceManager
>>>>> will cache. As this determines the lifetime of the references to the
>>>>> temporary BLOB cache if its not large enough BLOBs will be continually
>>>> read
>>>>> from the database (if using externalBlobs=false).
>>>>>
>>>>> o   Configurable in <PersistenceManager> XML block
>>>>>
>>>>> o   Default size 8MB
>>>>>
>>>>> o   This cache is shared by all sessions.
>>>>>
>>>>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>>>>> FineGrained
>>>>
>>>> correct, but there's additional synchronization in the persistence
>> manager
>>>> using
>>>> conventional synchronized methods. e.g. see
>>>> AbstractBundlePersistenceManager.load(NodeId)
>>>>
>>>>> ..        Session ItemManager Cache:
>>>>>
>>>>> o   Items are cached from the underlying persistence manager on a per
>>>>> session basis.
>>>>>
>>>>> o   Limit cannot be set.
>>>>
>>>> not sure, but I think this cache is also managed (at least partially) by
>>> the
>>>>
>>>> CacheManager.
>>>>
>>>>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>>>>
>>>> that's the 'other part' that manages the cache ;)
>>>>
>>>> items that are still referenced in the application will force the
>>> reference
>>>> map
>>>> to keep the respective ItemState instances (using weak references).
>>>>
>>>>> o   Synchronised access using the itemCache object
>>>>>
>>>>> ..        CacheManager Cache:
>>>>>
>>>>> o   Limit can only be set programmatically via the Workspace
>> cacheManager
>>>>>
>>>>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>>>>
>>>>> o   Defaults to 16MB
>>>>>
>>>>> o   Its not clear as yet how the CacheManager relates, if at all, to
> the
>>>>> ItemManager cache
>>>>
>>>> this only happens indirectly. see above.
>>>>
>>>>> 2 questions:
>>>>>
>>>>> ..        What is the purpose of the CacheManager and which caches does
>>> it
>>>>> actually control?
>>>>
>>>> It controls *all* the caches that contain ItemState instances.
>>>>
>>>>> ..        For example, for a workspace with 100,000 nodes what is an
>>>>> appropriate setting for the Cache Manager?
>>>>
>>>> I guess that depends on your JVM heap settings and the usage pattern. if
>>> you
>>>>
>>>> have a lot of random reads over nearly all 100k nodes and performance is
>>>> critical you may consider caching all of them. have a look a
>>>> ItemState.calculateMemoryFootprint() for a formula on how the memory
>>>> consumption
>>>> is calculated.
>>>>
>>>> regards
>>>>  marcel
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>

Mime
View raw message