From users-return-2472-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Tue Mar 06 12:12:48 2007 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 84289 invoked from network); 6 Mar 2007 12:12:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Mar 2007 12:12:48 -0000 Received: (qmail 28757 invoked by uid 500); 6 Mar 2007 12:12:56 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 28744 invoked by uid 500); 6 Mar 2007 12:12:56 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 28735 invoked by uid 99); 6 Mar 2007 12:12:56 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2007 04:12:56 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of stefan.guggisberg@gmail.com designates 64.233.182.188 as permitted sender) Received: from [64.233.182.188] (HELO nf-out-0910.google.com) (64.233.182.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2007 04:12:45 -0800 Received: by nf-out-0910.google.com with SMTP id x4so2472838nfb for ; Tue, 06 Mar 2007 04:12:23 -0800 (PST) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=k8nVmqIUallJ65OGMMldK0XkRy5RjJ9iQVASg5XX+fhVYvMs/KWaqKF3BAwWzsw5DuKW+6CqIvDwlN6j/shNLHfSdjts0vcYXfC4GvTYIoEJK1V+2mNSRCcrQAxvtUKJkiDbhQPRvPUaHyrydop9coljBX/aZi01I32gHqGIVkk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YosvKOc9lF0llDKjKhTzTekrpr3qNxckf6tPupDTa3qMfCf5jR7mBmc23l8P5y1ZHzInI9V/dYE8Z8LDbfvdPfv8IFZPDkW0DHPwlOtQnzsRz6lAdLEYOW5qhh40ENnP05RC4+EiEPeo09hLm8EpI1hIrW7VFCQMxveTgkLq5JM= Received: by 10.82.154.2 with SMTP id b2mr6844303bue.1173183143341; Tue, 06 Mar 2007 04:12:23 -0800 (PST) Received: by 10.49.3.2 with HTTP; Tue, 6 Mar 2007 04:12:23 -0800 (PST) Message-ID: <90a8d1c00703060412l5d97fb4asb5a508c6127b13ea@mail.gmail.com> Date: Tue, 6 Mar 2007 13:12:23 +0100 From: "Stefan Guggisberg" To: users@jackrabbit.apache.org Subject: Re: Database PersistenceManagers (was "Results of a JR Oracle test that we conducted) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <90a8d1c00703030711p198bdfafg62abcb384720f1b1@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org On 3/5/07, Bryan Davis wrote: > > > > On 3/3/07 7:11 AM, "Stefan Guggisberg" wrote: > > > hi bryan > > > > On 3/2/07, Bryan Davis wrote: > >> What persistence manager are you using? > >> > >> Our tests indicate that the stock persistence managers are a significant > >> bottleneck for both writes and also initial reads to load the transient > >> store (on the order of .5 seconds per node when using a remote database like > >> MSSQL or Oracle). > > > > what do you mean by "load the transient store"? > > > >> > >> The stock db persistence managers have all methods marked as "synchronized", > >> which blocks on the classdef (which means that even different persistence > >> managers for different workspaces will serialize all load, exists and store > > > > assuming you're talking about DatabasePersistenceManager: > > the store/destroy methods are 'synchronized' on the instance, not on > > the 'classdef'. > > see e.g. > > http://java.sun.com/docs/books/tutorial/essential/concurrency/syncmeth.html > > > > the load/exists methods are synchronized on the specific prepared stmt they're > > using. > > > > since every workspace uses its own persistence manager instance i can't > > follow your conclusion that all load, exists and store operations would be > > be globally serialized across all workspaces. > > Hm, this is my bad... It does seem that sync methods are on the instance. > Since the db persistence manager has "synchronized" on load, store and > exists, though, this would still serialize all of these operations for a > particular workspace. ?? the load methods are *not* synchronized. they contain a section which is synchronized on the particular prepared stmt. wrt synchronization: concurrency is controlled outside the persistence manager on a higher level. eliminating the method synchronization would imo therefore have *no* impact on concurrency/performance. cheers stefan > > >> operations). Presumably this is because they allocate a JDBC connection at > >> startup and use it throughout, and the connection object is not > >> multithreaded. > > > > what leads you to this assumption? > > Are there other requirements that all of these operations are serialized for > a particular PM instance? This seems like a pretty serious bottleneck (and, > in fact, is a pretty serious bottleneck when the database is remote from the > repository). > > >> > >> This problem isn't as noticeable when you are using embedded Derby and > >> reading/writing to the file system, but when you are doing a network > >> operation to a database server, the network latency in combination with the > >> serialization of all database operations results in a significant > >> performance degradation. > > > > again: serialization of 'all' database operations? > > The distinction between all and all for a workspace is would really only be > relevant during versioning, right? > > >> > >> The new bundle persistence manager (which isn't yet in SVN) improves things > >> dramatically since it inlines properties into the node, so loading or > >> persisting a node is only one operation (plus the additional connection for > >> the LOB) instead of one for the node and and one for each property. The > >> bundle persistence manager also uses prepared statements and keeps a > >> PM-level cache of nodes (with properties) and also non-existent nodes (which > >> permits many exists() calls to return without accessing the database). > >> > >> Changing all db persistence managers to use a datasource and get and release > >> connections inside of load, exists and store operations and eliminating the > >> method synchronization is a relatively simple change that further improves > >> performance for connecting to database servers. > > > > the use of datasources, connection pools and the like have been discussed > > in extenso on the list. see e.g. > > http://www.mail-archive.com/jackrabbit-dev@incubator.apache.org/msg05181.html > > http://issues.apache.org/jira/browse/JCR-313 > > > > i don't see how getting & releasing connections in every load, exists and > > store > > call would improve preformance. could you please elaborate? > > > > please note that you wouldn't be able to use prepared statements over multiple > > load, store etc operations because you'd have to return the connection > > at the end > > of every call. the performance might therefore be even worse. > > > > further note that write operations must occur within a single jdbc > > transaction, i.e. > > you can't get a new connection for every store/destroy operation. > > > > wrt synchronization: > > concurrency is controlled outside the persistence manager on a higher level. > > eliminating the method synchronization would imo therefore have *no* impact > > on concurrency/performance. > > So you are saying that it is impossible to concurrently load or store data > in Jackrabbit? > > >> There is a persistence manager with an ASL license called > >> "DataSourcePersistenceManager" which seems to the PM of choice for people > >> using Magnolia (which is backed by Jackrabbit). It also uses prepared > >> statements and eliminates the current single-connection issues associated > >> with all of the stock db PMs. It doesn't seem to have been submitted back > >> to the Jackrabbit project. If you Google for > >> "com.iorgagroup.jackrabbit.core.state.db.DataSourcePersistenceManager" you > >> should be able to find it. > > > > thanks for the hint. i am aware of this pm and i had a look at it a couple of > > months ago. the major issue was that it didn't implement the correct/required > > semantics. it used a new connection for every write operation which > > clearly violates the contract that the write operations should occur within > > a jdbc transaction bracket. further it creates a prepared stmt on every > > load, store etc. which is hardly efficient... > > Yes, this PM does have this issue. The bundle PM implements prepared > statements in the correct way. > > >> Finally, if you always use the Oracle 10g JDBC drivers, you do not need to > >> use the Oracle-specific PMs because the 10g drivers support the standard > >> BLOB API (in addition to the Oracle-specific BLOB API required by the older > >> 9i drivers). This is true even if you are connecting to an older database > >> server as the limitation was in the driver itself. Frankly you should never > >> use the 9i drivers as they are pretty buggy and the 10g drivers represent a > >> complete rewrite. Make sure you use the new driver package because the 10g > >> driver JAR also includes the older 9i drivers for backward-compatibility. > >> The new driver is in a new package (can't remember the exact name off the > >> top of my head). > > > > thanks for the information. > > > > cheers > > stefan > > We are very interested in getting a good understanding of the specifics of > how PM's work, as initial reads and writes, according to our profiling, are > spending 80-90% of the time inside the PM. > > Bryan. > > _______________________________________________________________________ > Notice: This email message, together with any attachments, may contain > information of BEA Systems, Inc., its subsidiaries and affiliated > entities, that may be confidential, proprietary, copyrighted and/or > legally privileged, and is intended solely for the use of the individual > or entity named in this message. If you are not the intended recipient, > and have received this message in error, please immediately return this > by email and then delete it. >