Mailing-List: contact commits-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Message-ID: <915036607.1227187080031.JavaMail.www-data@brutus>
Date: Thu, 20 Nov 2008 05:18:00 -0800 (PST)
From: confluence@apache.org
To: commits@jackrabbit.apache.org
Subject: [CONF] Apache Jackrabbit: Concurrency control (page edited)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Concurrency control (JCR) edited by Jukka Zitting
      Page: http://cwiki.apache.org/confluence/display/JCR/Concurrency+cont=
rol
   Changes: http://cwiki.apache.org/confluence/pages/diffpagesbyversion.act=
ion?pageId=3D102811&originalVersion=3D1&revisedVersion=3D2


Content:
---------------------------------------------------------------------

The internal concurrency model in Apache Jackrabbit is fairly complex and a=
 number of deadlock issues have been reported and fixed over the Jackrabbit=
 1.x release cycle. This document is the result of a design and code review=
 targeted at proactively preventing other similar issues.

This document is about the internal concurrency and synchronization model i=
n Jackrabbit, _not_ about the JCR locking feature. Note that the review tha=
t led to this document targeted concurrency control at an architectural lev=
el and did not focus much on issues like thread-safety of individual classe=
s or components.

This review is based on Jackrabbit version 1.5 in default configuration.

h2. Architectural background

In terms of concurrency control, the Jackrabbit architecture can roughly be=
 divided to five main layers:

  # Cluster
  # Repository
  # Workspace
  # Session
  # Transaction

The clustering layer takes care of synchronizing changes across one or more=
 cluster nodes that are each treated as individual repositories that happen=
 to share their content. Concurrency control across different cluster nodes=
 is achieved using a single write lock that a cluster node is required to a=
cquire before it can commit any changes to the shared state. On the other h=
and all cluster nodes can read the shared content in parallel with no expli=
cit synchronization. Note that since the cluster nodes only share a single =
lock, a deadlock can not occur between the locks in one node and the ones i=
n another. A single deadlocked node can still potentially block writes to t=
he entire cluster, but the clustering feature can not add any new deadlock =
scenarios if each node would be deadlock-free by itself.

The repository layer takes care of all global repository state like the nod=
e type registry and the version storage. Instead of a single global reposit=
ory lock, all the repository-wide components have their own synchronization=
 mechanisms. The most notable component from a concurrency control point of=
 view is the version storage, that actually contains two locking mechanisms=
; one in VersionManagerImpl for high level versioning operations and one in=
 the underlying SharedItemStateManager for controlling access to the underl=
ying persistence mechanism.

A repository consists of one or more workspaces that contain the normal con=
tent trees of the repository. Each workspace consists of a few components l=
ike the persistence mechanism and the search index. The persistence mechani=
sm is built from a SharedItemStateManager that controls all item operations=
 and a PersistenceManager that persists items in permanent storage. Most pe=
rsistence managers use Java synchronization or some other locking mechanism=
 for concurrency control, but since they typically don't interact much with=
 other parts of the repository they are not that critical from a global con=
currency perspective. On the other hand, the SharedItemStateManager that us=
es a read-write lock is a key element especially given the way it interacts=
 with the repository-wide version store. Note that since Jackrabbit 1.4 it =
has been possible to configure locking strategy of the SharedItemStateManag=
er to use a more fine-grained set of locks to allow concurrent write access=
 to different parts of the content tree. This review focuses on the default=
 case of having just a single SharedItemStateManager lock, but from a locki=
ng perspective the more fine-grained case is roughly equivalent to having m=
ore workspaces and thus the results of this review should still apply.

Each workspace can be accessed by zero or more JCR sessions. Each session c=
ontains a transient space that keeps track of all unsaved changes in that s=
ession. Since the transient space is local to a session and since a session=
 should only be accessed by one thread at a time, there are few concurrency=
 concerns associated with the use of sessions. However, note that the threa=
d-safety requirements of sessions are in many cases not explicitly enforced=
 by Jackrabbit, so a client that intentionally or accidentally uses a singl=
e session from multiple concurrent threads may well end up corrupting the i=
nternal state of the session.

Transactions are handled in Jackrabbit by wrapping all item operations (sav=
ed transient changes and direct workspace updates, as well as versioning an=
d locking operations) into a sort of a larger transient space that gets per=
sisted only when the transaction is committed. There is no "transaction loc=
k" in Jackrabbit, but transaction support still fundamentally changes Jackr=
abbit concurrency control as it basically replaces all write operations (an=
d related locking) with the larger commit operation. This transaction mode =
is only activated when a session is within the context of an XA transaction=
.

h2. Main synchronization mechanisms

The main synchronization mechanisms in Jackrabbit are the read-write locks =
in the SharedItemStateManager and VersionManagerImpl classes. Other compone=
nts also have concurrency control features, for example the LockManagerImpl=
 class (used for handling JCR locks) uses a reentrant lock and the NodeType=
Registry class relies on Java synchronization. This review focuses on just =
the two main components as those are by far the most actively used and the =
ones that could potentially block all access to repository content in case =
of a deadlock. The three main locks to be concerned about are:

   * "Workspace lock", the read-write lock of the per-workspace SharedItemS=
tateManager
   * "Versioning lock", the read-write lock of the repository-wide VersionM=
anagerImpl
   * "Version store lock", the read-write lock of the SharedItemStateManage=
r associated with the version manager

Each of these locks can be locked exclusively for write access or inclusive=
ly for read access. In other words, any number of concurrent readers can ke=
ep the lock, but any single writer will block out all other readers and wri=
ters.

As noted in the section above, the workspace locks may also be collections =
of more finely grained locks, but this review concentrates on the default c=
ase. Note also that each workspace has it's own lock, so even if one worksp=
ace is exclusively locked, other workspaces can still be accessed.

h2. Conditions for deadlocks

A deadlock can only occur if the holder of one lock tries to acquire anothe=
r lock and there is another thread (or a series of other threads) that trie=
s to do the reverse. This situation can only arise if a) locks are acquired=
 in a nested sequence, b) different threads can acquire the nested locks in=
 a different order, and c) at least two exclusive locks are being acquired.

Most operations in Jackrabbit avoid deadlocks in one of the following three=
 ways:

   * Only a single lock is held at a time, breaking condition a. This case =
covers most of the code doing sanity checks and other preparing work associ=
ated with many operations.
   * In case of nested locks, the code guarded by the inner lock never trie=
s to acquire another lock, breaking condition b. This case covers for examp=
le the numerous calls to the underlying persistence managers that typically=
 have their own synchronization mechanism but never call out to other Jackr=
abbit components except perhaps the namespace registry that also satisfies =
this condition.
   * None of the nested locks are exclusive. This covers all read operation=
s in Jackrabbit, so a deadlock can never occur if all clients only read fro=
m the repository.

The potentially troublesome cases are two or more concurrent write operatio=
ns with nested locks, or a write operation with two nested exclusive locks =
running concurrently with read operations with nested locks. See below for =
the results of the code review that tried to identify and clear such cases.=
 The acquired write locks are marked in bold to make it easy to spot potent=
ial problems.

h2. Code review

This section contains the results of a code review whose purpose was to ide=
ntify the order and nesting of the locks acquired by many common operations=
 in Jackrabbit. The results of the review were compared to the above condit=
ions for deadlock.

Note that the descriptions of the write operations below assume non-transac=
tional context. See the last subsection for the behaviour in transactional =
environments.

h3. Normal read access

Read access to the workspace typically only requires a read lock on the Sha=
redItemStateManager of that workspace, but since the version store is mappe=
d to the virtual /jcr:system/jcr:versionStorage inside the repository, ther=
e are cases where also the read lock of the version store needs to be acqui=
red.

   # Workspace read lock, for reading normal node content
       ## Version store read lock, for reading version content

This nested lock is potentially unsafe in a transactional context, see the =
subsection on transaction commit below for more details.

h3. Versioning read access

Some version accesses are handled directly through the version manager inst=
ead of looking through the /jcr:system/jcr:versionStorage tree. Such access=
ed are guarded with the VersionManagerImpl read lock.

   # Versioning read lock, for accessing version information
      ## Version store read lock, for reading version information

The nested lock here is safe as the version store lock never covers code th=
at tries to acquire the versioning lock.

h3. Transient changes

All transient changes like those created by Node.addNode() or Session.move(=
) are stored in the session-local transient space without needing any synch=
ronization except for the read locks used for accessing the underlying work=
space state. A write lock is only needed when the accumulated changes are b=
eing persisted using the save() call described below.

h3. Save

The ItemImpl.save() method (that SessionImpl.save() also calls) collects al=
l current transient changes to a single change log that is then persisted a=
s an atomic update. Any new versionable nodes will cause empty version hist=
ories to be created in the version store. Note that ItemImpl.save() is sync=
hronized on the current session, enforcing the rule that no two threads sho=
uld be concurrently using the same session.

   # Workspace read lock, for sanity checks and other preliminary work
   # Multiple non-overlapping instances of (only when creating new version =
histories)
      ## Workspace read lock, for checking the current state of the nodes b=
eing modified
      ## Version store read lock, for checking whether a version history al=
ready exists
      ## Versioning *write lock*, for creating a new version history
         ### Version store *write lock*, for persisting the version history
   # Workspace *write lock*, for persisting the changes
      ## Version store read lock, for checking references
      ## Version store *write lock*, for persisting updated back-references

Many of the other write operations below call ItemImpl.save() internally to=
 persist changes in the current workspace. However, in the descriptions I'v=
e only included the last "Workspace write lock" branch (with the "Version s=
tore write lock" excluded if it's clear that no back-references need to be =
updated) as the operations are guaranteed to never contain cases where new =
version histories would need to be created.

Here we have three cases of nested locks involving one or more exclusive lo=
cks:

   * Versioning write lock -> Version store write lock
   * Workspace write lock -> Version store read lock
   * Workspace write lock -> Version store write lock

All these nested locks are safe in non-transactional context since the vers=
ion store lock never covers code that tries to acquire one of the other loc=
ks. The same is true for the first case also in transactional context, but =
see the transaction commit subsection below for a discussion of how the oth=
er two cases are different with transactions.

h3. Merge and update

The Node.merge() and Node.update() methods both call NodeImpl.internalMerge=
() that acquires a new session on the source workspace and copies relevant =
content to the current workspace.

   # Multiple non-overlapping instances of
      ## Source workspace read lock, for copying content to the current wor=
kspace
      ## Current workspace read lock, for comparing current status with the=
 one being merged
   # Current workspace *write lock*, for persisting the changes
      ## Version store read lock, for checking references
      ## Version store *write lock*, for persisting updated back-references

The nested locks above are discussed in the section on the save operation.

h3. Copy, clone and move

The various copy(), clone() and move() methods in WorkspaceImpl use the sim=
ilarly called methods in BatchedItemOperations to perform batch operations =
within a single workspace or across two workspaces. From a synchronization =
perspective these operations are much like the merge and update operations =
above, the difference is mostly that the source workspace may be the same a=
s the current workspace.

   # Multiple non-overlapping instances of
      ## Source workspace read lock, for copying content to the current wor=
kspace
      ## Current workspace read lock, for comparing current status with the=
 one being copied
   # Current workspace *write lock*, for persisting the changes
      ## Version store read lock, for checking references
      ## Version store *write lock*, for persisting updated back-references

The nested locks above are discussed in the section on the save operation.

h3. Checkin

The NodeImpl.checkin() method first creates a new version of the node in th=
e shared version store and then updates the appropriate mix:versionable pro=
perties of the node.

   # Workspace read lock, for sanity checks and other preliminary work
   # Versioning *write lock*, for creating the new version in the version s=
tore
      ## Workspace read lock, for copying content to the new version
      ## Version store *write lock*, for persisting the new version
   # Versioning read lock, for accessing the newly created version
      ## Version store read lock, for reading the new version
   # Workspace *write lock*, for updating the node with references to the n=
ew version
      ## Version store read lock, for checking references
      ## Version store *write lock*, for persisting updated back-references

The overlapping lock region above is not troublesome as there are no cases =
where a versioning lock is acquired within the scope of a workspace lock. N=
ote that there previously were such cases, but this code review shows that =
all of them have since been solved.

The nested locks above are discussed in the sections on versioning read acc=
ess and the save operation.

h3. Checkout

The NodeImpl.checkout() method simply does some sanity checks and updates t=
he versioning metadata of the node to reflect the changed state. No access =
to the shared version store is needed.

   # Workspace read lock, for sanity checks
   # Workspace *write lock*, for updating the node to reflect the checked o=
ut state
      ## Version store read lock, for checking references

The nested lock above is discussed in the section on the save operation.

h3. Restore

The various Node.restore() and Workspace.restore() methods all end up calli=
ng NodeImpl.internalRestore() that copies contents of the selected version =
back to the workspace. Finally the changes are persisted with a ItempImpl.s=
ave() call.

   # Multiple non-overlapping instances of:
      ## Versioning read lock, for copying content back to the workspace
      ## Workspace read lock, for comparing the current state with the vers=
ion being restored
   # Workspace *write lock*, for persisting the changes
      ## Version store read lock, for checking references
      ## Version store *write lock*, for persisting updated back-references

The nested locks above are discussed in the section on the save operation.

h3. Transaction commit

As discussed in the architecture section above, a transaction context overr=
ides all the other write operations in favor of the two-phase commit driven=
 by the transaction manager. The Jackrabbit part of a potentially distribut=
ed transaction is coordinated by the XASessionImpl class that causes the fo=
llowing locking behavior:

   # Versioning *write lock*, for the entire commit
      ## Version store *write lock*, for persisting modified version histor=
ies
          ### Workspace *write lock*, for persisting modified content

The curious ordering of the locks is caused by the way the prepare and comm=
it parts of the different transaction components are nested. This nesting o=
f the workspace lock within the version store lock is a bit troublesome in =
comparison with the nesting in read operations and non-transactional writes=
 where the order of the locks is reverse. The nesting order here can not be=
 easily changed as any new versions and version histories need to be persis=
ted before workspace content that refers to them. Possible solutions could =
be either to disable or redesign the reference checks done in a transaction=
al context, or to relax transaction semantics by persisting the version his=
tory changes already in the prepare phase in which case the version store l=
ock wouldn't need to cover the workspace lock. However, even before this is=
sue is fixed, the impact is quite limited and can easily be worked around b=
y typical clients.

In read operations the version store read lock is only acquired after the w=
orkspace lock if reading content in /jcr:system/jcr:versionStorage. Clients=
 that never looks at the /jcr:system/jcr:versionStorage tree and uses the J=
CR API methods like getVersionHistory() to access version information will =
not trigger the potential deadlock case.

Write operations can only cause a deadlock when both transactional and non-=
transactional writes are performed concurrently against the same repository=
. A repository that is consistently accessed either transactionally or non-=
transactionally will not trigger this deadlock. Note that this restriction =
is workspace-specific, i.e. one workspace can safely be written to transact=
ionally even if another workspace is concurrently written to non-transactio=
nally.

h2. Summary and future work

This review shows that while the internal locking behaviour in Jackrabbit i=
s still far from simple, there aren't any major deadlock scenarios remainin=
g. The two issues identified in the review can be easily avoided by followi=
ng these two rules:

   * Use the JCR versioning API instead of the /jcr:system/jcr:versionStora=
ge tree to access version information
   * Don't mix concurrent transactional and non-transactional writes to a s=
ingle workspace

The transaction commit subsection above outlines some possible solutions to=
 make even these workarounds unnecessary.

The following other potential improvements were identified during the code =
review:

   * Storing the version history back-references in the workspaces that con=
tain the references would simplify a lot of code and remove a major source =
of interaction between the workspace and version store when updating conten=
t. The downside of this change is that removing versions and version histor=
ies would be much more difficult as all workspaces would need to be checked=
 for potential references.
   * The current design contains lots of cases where read locks are acquire=
d and released multiple times in sequence. This is often caused by the need=
 to check the transient space when reading something from the repository. I=
t might be useful to extend the workspace read lock to cover also all the t=
ransient spaces even when the transient spaces would still be session-speci=
fic.
   * Adopting a single global repository lock for all per-repository compon=
ents would simplify lots of code at the expense of some performance.


---------------------------------------------------------------------
CONFLUENCE INFORMATION
This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences
   http://cwiki.apache.org/confluence/users/viewnotifications.action

If you think it was sent incorrectly contact one of the administrators
   http://cwiki.apache.org/confluence/administrators.action

If you want more information on Confluence, or have a bug to report see
   http://www.atlassian.com/software/confluence