Return-Path: Delivered-To: apmail-incubator-jackrabbit-dev-archive@www.apache.org Received: (qmail 83335 invoked from network); 2 Feb 2006 15:15:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Feb 2006 15:15:23 -0000 Received: (qmail 88651 invoked by uid 500); 2 Feb 2006 15:15:22 -0000 Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jackrabbit-dev@incubator.apache.org Delivered-To: mailing list jackrabbit-dev@incubator.apache.org Received: (qmail 88640 invoked by uid 99); 2 Feb 2006 15:15:21 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2006 07:15:21 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of marcel.reutegger@gmx.net designates 213.165.64.21 as permitted sender) Received: from [213.165.64.21] (HELO mail.gmx.net) (213.165.64.21) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 02 Feb 2006 07:15:21 -0800 Received: (qmail invoked by alias); 02 Feb 2006 15:14:59 -0000 Received: from bsl-rtr.day.com (EHLO [10.0.0.58]) [212.249.34.130] by mail.gmx.net (mp034) with SMTP; 02 Feb 2006 16:14:59 +0100 X-Authenticated: #894343 Message-ID: <43E221F2.1060000@gmx.net> Date: Thu, 02 Feb 2006 16:14:58 +0100 From: Marcel Reutegger User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: jackrabbit-dev@incubator.apache.org Subject: Re: DP Persistence manager implementation References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Miro Walker wrote: > We've been discussing the DB PM implementation, and have a couple of > questions regarding the implementation of this. At the moment, the > Simple DB PM appears to have been implemented using a single connection > with all write operations synchronised on a single object. This would > imply that all writes to the database are single threaded, effectively > making any application using it also run single threaded for write > operations. This appears to have two implications: this is not quite true. the actual store operation on the persistence manager is synchronized. however most of the write calls from different threads to the JCR api in jackrabbit will not block each other because those changes are made in a private transient scope. only the final save or commit of the transaction is serialized. that's only one part of the whole write process. > 1. Performance - in a multi-user system, having single-threaded writes > to the database will make the JDBC connection a serious bottleneck as > soon as the application comes under load. It also means that any > background processing that needs to iterate over the repository making > changes (and we have a few of those) will effectively bring all other > users to a grinding halt. this depends very much on the use case. again, all changes that such a background process does, are first made in a transient scope and other sessions are only affected if at all when the changes are stored in the persistence manager. while one session stores changes, other sessions are still able to read certain items, as long as those are available in the LocalItemStateManager. Only when other sessions access item that are not available in their LocalItemStateManager they will be blocked until the store is finished. > 2. Transactions - we haven't tested this (as the recent support for > transactions in versioning operations has not been integrated into our > system), but it appears that to if a single connection is being used, > then we can only have a single transaction active at any one time. So, > if each user tries to execute a transaction with multiple write > operations in it, and these transactions are to be propagated through to > the database, then each transaction must complete before the next can > begin. This would either mean we get exceptions if the system attempts > to interleave operations from different transactions or that each > transaction must complete in full before another can begin, further > compounding the performance issue. the scopes of a JCR transaction and a transaction on the underlying database that is used by jackrabbit are not the same. A JCR transaction starts with the first modified item, whereas the transaction of the underlying database starts with the call to Item.save() or Session.save() or the JTA transaction commit (whatever you prefer ;)). that basically means JCR transactions can run in parallel for most of the time, only the commit phase of the JCR transaction is serialized. > In addition to the implications of using a single synchronised > connection, another issue appears to be that the system will be unable > to recover from a connection failure. For example, if the system were > deployed onto a highly available database cluster, then in the event of > DB instance failure, any open connections will be killed, but can quite > happily be reopened later. Jackrabbit appears to create a connection on > initialisation, and has no way to recover if that connection is killed. This is certainly an issue with the SimpleDbPersistenceManager. I guess that's why it is called Simple... IMO the purpose of the SimpleDbPersistenceManager is mainly embedded databases where a connection failure is highly unlikely, as there is no network in between. > I know that questions around implementing support for connection pooling > on the DB have been raised before and then dismissed as unimportant, but > this appears to me to be pretty fundamental. By using a connection pool > implementation that supports recreating dead connections and supports > providing tying a connection to a transaction context, multiple > transactions could run in parallel, helping throughput and making the > system more reliable. even if such a persistence manager allows concurrent writes, it is still the responsibility of the caller to ensure consistency. in our case that's the SharedItemStateManager. And that's the place where transactions are currently serialized, but only on commit. If concurrent write performance should become a real issue that's where we first have to deal with it. regards marcel