Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jackrabbit-dev@incubator.apache.org
Received-SPF: pass (asf.osuosl.org: domain of marcel.reutegger@gmx.net
 designates 213.165.64.21 as permitted sender)
Message-ID: <43E221F2.1060000@gmx.net>
Date: Thu, 02 Feb 2006 16:14:58 +0100
From: Marcel Reutegger <marcel.reutegger@gmx.net>
User-Agent: Thunderbird 1.5 (Windows/20051201)
MIME-Version: 1.0
To: jackrabbit-dev@incubator.apache.org
Subject: Re: DP Persistence manager implementation
References: 
 <FE311A9864CF7940A736054B52211A62036AD23C@THEXCHBE2X.services.byworkwise.com>
In-Reply-To: 
 <FE311A9864CF7940A736054B52211A62036AD23C@THEXCHBE2X.services.byworkwise.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Miro Walker wrote:
> We've been discussing the DB PM implementation, and have a couple of
> questions regarding the implementation of this. At the moment, the
> Simple DB PM appears to have been implemented using a single connection
> with all write operations synchronised on a single object. This would
> imply that all writes to the database are single threaded, effectively
> making any application using it also run single threaded for write
> operations. This appears to have two implications:

this is not quite true. the actual store operation on the persistence 
manager is synchronized. however most of the write calls from different 
threads to the JCR api in jackrabbit will not block each other because 
those changes are made in a private transient scope. only the final save 
or commit of the transaction is serialized. that's only one part of the 
whole write process.

> 1. Performance - in a multi-user system, having single-threaded writes
> to the database will make the JDBC connection a serious bottleneck as
> soon as the application comes under load. It also means that any
> background processing that needs to iterate over the repository making
> changes (and we have a few of those) will effectively bring all other
> users to a grinding halt. 

this depends very much on the use case. again, all changes that such a 
background process does, are first made in a transient scope and other 
sessions are only affected if at all when the changes are stored in the 
persistence manager.
while one session stores changes, other sessions are still able to read 
certain items, as long as those are available in the 
LocalItemStateManager. Only when other sessions access item that are not 
available in their LocalItemStateManager they will be blocked until the 
store is finished.

> 2. Transactions - we haven't tested this (as the recent support for
> transactions in versioning operations has not been integrated into our
> system), but it appears that to if a single connection is being used,
> then we can only have a single transaction active at any one time. So,
> if each user tries to execute a transaction with multiple write
> operations in it, and these transactions are to be propagated through to
> the database, then each transaction must complete before the next can
> begin. This would either mean we get exceptions if the system attempts
> to interleave operations from different transactions or that each
> transaction must complete in full before another can begin, further
> compounding the performance issue.

the scopes of a JCR transaction and a transaction on the underlying 
database that is used by jackrabbit are not the same. A JCR transaction 
starts with the first modified item, whereas the transaction of the 
underlying database starts with the call to Item.save() or 
Session.save() or the JTA transaction commit (whatever you prefer ;)).

that basically means JCR transactions can run in parallel for most of 
the time, only the commit phase of the JCR transaction is serialized.

> In addition to the implications of using a single synchronised
> connection, another issue appears to be that the system will be unable
> to recover from a connection failure. For example, if the system were
> deployed onto a highly available database cluster, then in the event of
> DB instance failure, any open connections will be killed, but can quite
> happily be reopened later. Jackrabbit appears to create a connection on
> initialisation, and has no way to recover if that connection is killed.

This is certainly an issue with the SimpleDbPersistenceManager. I guess 
that's why it is called Simple...

IMO the purpose of the SimpleDbPersistenceManager is mainly embedded 
databases where a connection failure is highly unlikely, as there is no 
network in between.

> I know that questions around implementing support for connection pooling
> on the DB have been raised before and then dismissed as unimportant, but
> this appears to me to be pretty fundamental. By using a connection pool
> implementation that supports recreating dead connections and supports
> providing tying a connection to a transaction context, multiple
> transactions could run in parallel, helping throughput and making the
> system more reliable.

even if such a persistence manager allows concurrent writes, it is still 
the responsibility of the caller to ensure consistency. in our case 
that's the SharedItemStateManager. And that's the place where 
transactions are currently serialized, but only on commit.

If concurrent write performance should become a real issue that's where 
we first have to deal with it.

regards
  marcel