jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Davis <brda...@bea.com>
Subject Re: Database PersistenceManagers (was "Results of a JR Oracle test that we conducted)
Date Mon, 05 Mar 2007 18:23:07 GMT



On 3/3/07 7:11 AM, "Stefan Guggisberg" <stefan.guggisberg@gmail.com> wrote:

> hi bryan
> 
> On 3/2/07, Bryan Davis <brdavis@bea.com> wrote:
>> What persistence manager are you using?
>> 
>> Our tests indicate that the stock persistence managers are a significant
>> bottleneck for both writes and also initial reads to load the transient
>> store (on the order of .5 seconds per node when using a remote database like
>> MSSQL or Oracle).
> 
> what do you mean by "load the transient store"?
> 
>> 
>> The stock db persistence managers have all methods marked as "synchronized",
>> which blocks on the classdef (which means that even different persistence
>> managers for different workspaces will serialize all load, exists and store
> 
> assuming you're talking about DatabasePersistenceManager:
> the store/destroy methods are 'synchronized' on the instance, not on
> the 'classdef'.
> see e.g. 
> http://java.sun.com/docs/books/tutorial/essential/concurrency/syncmeth.html
> 
> the load/exists methods are synchronized on the specific prepared stmt they're
> using.
> 
> since every workspace uses its own persistence manager instance i can't
> follow your conclusion that all load, exists and store operations would be
> be globally serialized across all workspaces.

Hm, this is my bad... It does seem that sync methods are on the instance.
Since the db persistence manager has "synchronized" on load, store and
exists, though, this would still serialize all of these operations for a
particular workspace.

>> operations).  Presumably this is because they allocate a JDBC connection at
>> startup and use it throughout, and the connection object is not
>> multithreaded.
> 
> what leads you to this assumption?

Are there other requirements that all of these operations are serialized for
a particular PM instance?  This seems like a pretty serious bottleneck (and,
in fact, is a pretty serious bottleneck when the database is remote from the
repository).

>> 
>> This problem isn't as noticeable when you are using embedded Derby and
>> reading/writing to the file system, but when you are doing a network
>> operation to a database server, the network latency in combination with the
>> serialization of all database operations results in a significant
>> performance degradation.
> 
> again: serialization of 'all' database operations?

The distinction between all and all for a workspace is would really only be
relevant during versioning, right?
 
>> 
>> The new bundle persistence manager (which isn't yet in SVN) improves things
>> dramatically since it inlines properties into the node, so loading or
>> persisting a node is only one operation (plus the additional connection for
>> the LOB) instead of one for the node and and one for each property.  The
>> bundle persistence manager also uses prepared statements and keeps a
>> PM-level cache of nodes (with properties) and also non-existent nodes (which
>> permits many exists() calls to return without accessing the database).
>> 
>> Changing all db persistence managers to use a datasource and get and release
>> connections inside of load, exists and store operations and eliminating the
>> method synchronization is a relatively simple change that further improves
>> performance for connecting to database servers.
> 
> the use of datasources, connection pools and the like have been discussed
> in extenso on the list. see e.g.
> http://www.mail-archive.com/jackrabbit-dev@incubator.apache.org/msg05181.html
> http://issues.apache.org/jira/browse/JCR-313
> 
> i don't see how getting & releasing connections in every load, exists and
> store
> call would improve preformance. could you please elaborate?
> 
> please note that you wouldn't be able to use prepared statements over multiple
> load, store etc operations because you'd have to return the connection
> at the end
> of every call. the performance might therefore be even worse.
> 
> further note that write operations must occur within a single jdbc
> transaction, i.e.
> you can't get a new connection for every store/destroy operation.
> 
> wrt synchronization:
> concurrency is controlled outside the persistence manager on a higher level.
> eliminating the method synchronization would imo therefore have *no* impact
> on concurrency/performance.

So you are saying that it is impossible to concurrently load or store data
in Jackrabbit?

>> There is a persistence manager with an ASL license called
>> "DataSourcePersistenceManager" which seems to the PM of choice for people
>> using Magnolia (which is backed by Jackrabbit).  It also uses prepared
>> statements and eliminates the current single-connection issues associated
>> with all of the stock db PMs.  It doesn't seem to have been submitted back
>> to the Jackrabbit project.  If you Google for
>> "com.iorgagroup.jackrabbit.core.state.db.DataSourcePersistenceManager" you
>> should be able to find it.
> 
> thanks for the hint. i am aware of this pm and i had a look at it a couple of
> months ago. the major issue was that it didn't implement the correct/required
> semantics. it used a new connection for every write operation which
> clearly violates the contract that the write operations should occur within
> a jdbc transaction bracket. further it creates a prepared stmt on every
> load, store etc. which is hardly efficient...

Yes, this PM does have this issue.  The bundle PM implements prepared
statements in the correct way.

>> Finally, if you always use the Oracle 10g JDBC drivers, you do not need to
>> use the Oracle-specific PMs because the 10g drivers support the standard
>> BLOB API (in addition to the Oracle-specific BLOB API required by the older
>> 9i drivers).  This is true even if you are connecting to an older database
>> server as the limitation was in the driver itself.  Frankly you should never
>> use the 9i drivers as they are pretty buggy and the 10g drivers represent a
>> complete rewrite.  Make sure you use the new driver package because the 10g
>> driver JAR also includes the older 9i drivers for backward-compatibility.
>> The new driver is in a new package (can't remember the exact name off the
>> top of my head).
> 
> thanks for the information.
> 
> cheers
> stefan

We are very interested in getting a good understanding of the specifics of
how PM's work, as initial reads and writes, according to our profiling, are
spending 80-90% of the time inside the PM.

Bryan.

_______________________________________________________________________
Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.

Mime
View raw message