jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Abley <james.ab...@gmail.com>
Subject Hot lock in BitsetENTCacheImpl
Date Fri, 03 Dec 2010 14:07:31 GMT
Hi,

Jackrabbit core 1.4.5 running on Sun Java 1.6.0_20.

We are seeing intermittent slowdowns in our application. Thread dumps show
that a lot of requests are blocked with a stack trace similar to the
following:

   java.lang.Thread.State: BLOCKED (on object monitor)
at EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap.size(Unknown
Source)
 - waiting to lock <0x8dfea658> (a
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap)
at
org.apache.jackrabbit.core.nodetype.BitsetENTCacheImpl.getKey(BitsetENTCacheImpl.java:86)
 at
org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.getEffectiveNodeType(NodeTypeRegistry.java:1049)
at
org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.getEffectiveNodeType(NodeTypeRegistry.java:478)
 at
org.apache.jackrabbit.core.NodeImpl.getEffectiveNodeType(NodeImpl.java:854)
at org.apache.jackrabbit.core.NodeImpl.isNodeType(NodeImpl.java:1237)
 at org.apache.jackrabbit.core.NodeImpl.isNodeType(NodeImpl.java:2608)
at com.example.contentrepository.jcr.JcrFactory.wrap(JcrFactory.java:51)
 at
com.example.contentrepository.jcr.NodeIteratorImpl.next(NodeIteratorImpl.java:73)
at
com.example.contentrepository.jcr.NodeIteratorImpl.next(NodeIteratorImpl.java:22)

Examining that call stack:

call NodeImpl.isNodeType(String)

- simple pass through

call NodeImpl.isNodeType(Name)

- Retrieves the NodeTypeRegistry from the SessionContext, which in turn
retrieves the NodeTypeRegistry from the RepositoryContext. NodeTypeRegistry
is created at startup for the repository => NodeTypeRegistry is a shared
resource across the entire Repository.

call NodeTypeRegistry.getEffectiveNodeType(Name, Set<Name>)

- since there are mixins in use for lots of our content, this method merges
the primary Name and Set into a single array and delegates to

call getEffectiveNodeType(Name[], EffectiveNodeTypeCache, Map<Name,
QNodeTypeDefinition>)

- EffectiveNodeTypeCache is a BitsetENTCacheImpl created at repository
startup. First off this method tries to get an EffectiveNodeTypeCache.Key:

call BitsetENTCacheImpl.getKey()

The implementation of this method looks like:

    public Key getKey(Name[] ntNames) {
        return new BitSetKey(ntNames, nameIndex.size() + ntNames.length);
    }

The call to nameIndex.size() is where the contention is. nameIndex is a
final EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap ivar.

The implementation of its size method looks like:

  public synchronized int size() {
    return count;
  }


Pretty obvious why we're getting the reported thread dumps and slowdowns.


In summary, calling NodeImpl.isNodeType() will always attempt to acquire a
single Repository-wide lock, in Jackrabbit 1.4.5 at least. 1.4.5 is a pretty
old version. Should I raise this in JIRA?


I guess I need to either:

   1. Upgrade Jackrabbit to a version that doesn't suffer from this issue
   (if there is one?). We're long overdue doing this. Something that uses
   j.u.c.ConcurrentHashMap instead, which attempts to provide an unsynchronized
   implementation of the size method would be good. We don't need JCR 2
   features though.
   2. Alternatively, alter our application code to not call
   NodeImpl.isNodeType so much.

I've not found any mentions of other people suffering from this
implementation detail.

Does anyone have suggestions for an upgrade path, ideally to a version that
does not suffer from the same contention issue? jackrabbit-core still seems
to have the same problem in trunk at this time. There is jackrabbit-jcr2spi
which contains an implementation of EffectiveNodeTypeCache
named BitsetENTCacheImpl and that
uses java.util.concurrent.ConcurrentHashMap but I'm not clear on what the
jcr2spi module is used for.

Some more background on our upgrade options as well. We have lots of
existing content. We have a single cluster containing multiple nodes.

   - Single cluster with multiple nodes.
   - Database Persistence Manager.
   - Shared DataStore (using NFS).
   - Local Lucene indexes.
   - Lots of existing content.
   - 2 nodes responsible for writing - some downtime tolerated for small
   windows at pre-arranged intervals.
   - All nodes responsible for reading - potentially can take a node
   completely off-line while Lucene indexes rebuild, etc.
   - Presumably database schema changes would be harder to tolerate with a
   shared database as the persistence store.

Cheers,

James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message