jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: Database PersistenceManagers (was "Results of a JR Oracle test that we conducted)
Date Tue, 06 Mar 2007 12:12:23 GMT
On 3/5/07, Bryan Davis <brdavis@bea.com> wrote:
>
>
>
> On 3/3/07 7:11 AM, "Stefan Guggisberg" <stefan.guggisberg@gmail.com> wrote:
>
> > hi bryan
> >
> > On 3/2/07, Bryan Davis <brdavis@bea.com> wrote:
> >> What persistence manager are you using?
> >>
> >> Our tests indicate that the stock persistence managers are a significant
> >> bottleneck for both writes and also initial reads to load the transient
> >> store (on the order of .5 seconds per node when using a remote database like
> >> MSSQL or Oracle).
> >
> > what do you mean by "load the transient store"?
> >
> >>
> >> The stock db persistence managers have all methods marked as "synchronized",
> >> which blocks on the classdef (which means that even different persistence
> >> managers for different workspaces will serialize all load, exists and store
> >
> > assuming you're talking about DatabasePersistenceManager:
> > the store/destroy methods are 'synchronized' on the instance, not on
> > the 'classdef'.
> > see e.g.
> > http://java.sun.com/docs/books/tutorial/essential/concurrency/syncmeth.html
> >
> > the load/exists methods are synchronized on the specific prepared stmt they're
> > using.
> >
> > since every workspace uses its own persistence manager instance i can't
> > follow your conclusion that all load, exists and store operations would be
> > be globally serialized across all workspaces.
>
> Hm, this is my bad... It does seem that sync methods are on the instance.
> Since the db persistence manager has "synchronized" on load, store and
> exists, though, this would still serialize all of these operations for a
> particular workspace.

?? the load methods are *not* synchronized. they contain a section which
is synchronized on the particular prepared stmt.

<quote from my previous reply>
wrt synchronization:
concurrency is controlled outside the persistence manager on a higher level.
eliminating the method synchronization would imo therefore have *no* impact
on concurrency/performance.
</quote>

cheers
stefan

>
> >> operations).  Presumably this is because they allocate a JDBC connection at
> >> startup and use it throughout, and the connection object is not
> >> multithreaded.
> >
> > what leads you to this assumption?
>
> Are there other requirements that all of these operations are serialized for
> a particular PM instance?  This seems like a pretty serious bottleneck (and,
> in fact, is a pretty serious bottleneck when the database is remote from the
> repository).
>
> >>
> >> This problem isn't as noticeable when you are using embedded Derby and
> >> reading/writing to the file system, but when you are doing a network
> >> operation to a database server, the network latency in combination with the
> >> serialization of all database operations results in a significant
> >> performance degradation.
> >
> > again: serialization of 'all' database operations?
>
> The distinction between all and all for a workspace is would really only be
> relevant during versioning, right?
>
> >>
> >> The new bundle persistence manager (which isn't yet in SVN) improves things
> >> dramatically since it inlines properties into the node, so loading or
> >> persisting a node is only one operation (plus the additional connection for
> >> the LOB) instead of one for the node and and one for each property.  The
> >> bundle persistence manager also uses prepared statements and keeps a
> >> PM-level cache of nodes (with properties) and also non-existent nodes (which
> >> permits many exists() calls to return without accessing the database).
> >>
> >> Changing all db persistence managers to use a datasource and get and release
> >> connections inside of load, exists and store operations and eliminating the
> >> method synchronization is a relatively simple change that further improves
> >> performance for connecting to database servers.
> >
> > the use of datasources, connection pools and the like have been discussed
> > in extenso on the list. see e.g.
> > http://www.mail-archive.com/jackrabbit-dev@incubator.apache.org/msg05181.html
> > http://issues.apache.org/jira/browse/JCR-313
> >
> > i don't see how getting & releasing connections in every load, exists and
> > store
> > call would improve preformance. could you please elaborate?
> >
> > please note that you wouldn't be able to use prepared statements over multiple
> > load, store etc operations because you'd have to return the connection
> > at the end
> > of every call. the performance might therefore be even worse.
> >
> > further note that write operations must occur within a single jdbc
> > transaction, i.e.
> > you can't get a new connection for every store/destroy operation.
> >
> > wrt synchronization:
> > concurrency is controlled outside the persistence manager on a higher level.
> > eliminating the method synchronization would imo therefore have *no* impact
> > on concurrency/performance.
>
> So you are saying that it is impossible to concurrently load or store data
> in Jackrabbit?
>
> >> There is a persistence manager with an ASL license called
> >> "DataSourcePersistenceManager" which seems to the PM of choice for people
> >> using Magnolia (which is backed by Jackrabbit).  It also uses prepared
> >> statements and eliminates the current single-connection issues associated
> >> with all of the stock db PMs.  It doesn't seem to have been submitted back
> >> to the Jackrabbit project.  If you Google for
> >> "com.iorgagroup.jackrabbit.core.state.db.DataSourcePersistenceManager" you
> >> should be able to find it.
> >
> > thanks for the hint. i am aware of this pm and i had a look at it a couple of
> > months ago. the major issue was that it didn't implement the correct/required
> > semantics. it used a new connection for every write operation which
> > clearly violates the contract that the write operations should occur within
> > a jdbc transaction bracket. further it creates a prepared stmt on every
> > load, store etc. which is hardly efficient...
>
> Yes, this PM does have this issue.  The bundle PM implements prepared
> statements in the correct way.
>
> >> Finally, if you always use the Oracle 10g JDBC drivers, you do not need to
> >> use the Oracle-specific PMs because the 10g drivers support the standard
> >> BLOB API (in addition to the Oracle-specific BLOB API required by the older
> >> 9i drivers).  This is true even if you are connecting to an older database
> >> server as the limitation was in the driver itself.  Frankly you should never
> >> use the 9i drivers as they are pretty buggy and the 10g drivers represent a
> >> complete rewrite.  Make sure you use the new driver package because the 10g
> >> driver JAR also includes the older 9i drivers for backward-compatibility.
> >> The new driver is in a new package (can't remember the exact name off the
> >> top of my head).
> >
> > thanks for the information.
> >
> > cheers
> > stefan
>
> We are very interested in getting a good understanding of the specifics of
> how PM's work, as initial reads and writes, according to our profiling, are
> spending 80-90% of the time inside the PM.
>
> Bryan.
>
> _______________________________________________________________________
> Notice:  This email message, together with any attachments, may contain
> information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
> entities,  that may be confidential,  proprietary,  copyrighted  and/or
> legally privileged, and is intended solely for the use of the individual
> or entity named in this message. If you are not the intended recipient,
> and have received this message in error, please immediately return this
> by email and then delete it.
>

Mime
View raw message