jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Database PersistenceManagers (was "Results of a JR Oracle test that we conducted)
Date Mon, 12 Mar 2007 15:16:12 GMT
Hi Bryan,

Bryan Davis wrote:
> It is also tempting to try and push this functionality into the database,
> since it is already doing all of the required locking anyway.  A custom
> transaction manager that delegated to the repository session transaction
> manager (thereby associating JDBC connections with the repository session),
> in conjunction with a stock data source implementation (see below) might do
> the trick.  Of course this would only work with database PM¹s, but perhaps
> other TM¹s could still have the existing SISM locking enabled.  This would
> be good enough for us since we only use database PM¹s, and a better, more
> universal solution could be implemented at a later date.
> Has anyone looked into this issue at all or have any advice / thoughts?

well, it's not that easy. locking in SISM is required because there is a cache 
of item states. the same requirements for a persistence manager also applies to 
this cache. specifically, after the change log got persisted the cache must be 
updated atomically.

there were some recent thoughts about implementing a multi-version cache 
(similar to MVCC in databases), which would support better concurrency. but that 
requires more effort than just removing some synchronized modifiers in the 
source code.

> Issue #2: Multiple issues in database persistence managers.  I believe the
> database persistence managers have multiple issues (please correct me if I
> get any of this wrong).
> 1. JDBC connection details should not be in the repository.xml.  I should be
> free to change the specifics of a particular database connection without it
> constituting a repository initialization parameter change (which is
> effectively what changing the repository.xml is, since it gets copied and
> subsequently accessed from inside the repository itself).  If a host name or
> driver class or even connection URL changes, I should not have to manually
> edit internal repository configuration files to effect the change.

there's a 
org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager, which 
exactly does what you need.

> 2. Sharing JDBC connections (and objects obtained from them, like prepared
> statements) between multiple threads is not a good practice.  Even though
> many drivers support such activity, it is not specifically required by the
> spec, and many drivers do not support it.  Even for ones that do, there are
> always a significant list of caveats (like changing the transaction
> isolation of a connection impacting all threads, or rollbacks sometimes
> being executed against the wrong thread).  Plus, as far as I can tell, there
> is also no particular good reason to attempt this in this case.  Connection
> pooling is extremely well understood and there are a variety of
> implementations to choose from (including Apache¹s own in Jakarta Commons).

so far we didn't see any issues with our approach of sharing a connection among 
threads. we are always happy to accept contributions in that area ;)

> 3. Synchronization is  bad (generally speaking of course :).

one could also say (generally speaking) 'no synchronization' is bad. if you have 
a multi-threaded program without any synchronization and is super-fast, but just 
doesn't behave correctly, what good is it then?

> Once the
> multithreaded issues of JDBC are removed (via a connection pool), there are
> no good reasons that I can see to have any synchronization in the database
> persistence managers.

that's correct but won't solve your issue, because of JCR-314.

btw. you can easily implement your own persistence manager based on the 
JNDIDatabasePersistenceManager that provides concurrent reads. I think we didn't 
implement one so far because we didn't see much value in it.

> Since any sort of requirement for synchronized
> operation would be coming from a higher layer, it should also be provided at
> a higher layer.  I have always felt that a good rule of thumb in server code
> is to avoid synchronization at all costs, particular in core server
> functionality (like reading and writing to a repository).  It if extremely
> difficult to fully understand the global implications of synchronized code,
> particular code that is synchronized at a low level.  Any serialization of
> user requests is extremely serious in a multithreaded server and, in my
> experience, will lead to show-stopping performance and scalability issues in
> nearly all cases.  In addition, serialization of requests at such a low
> level probably means that other synchronized code that is intended to be
> properly multithreaded is probably not well tested since the request
> serialization has eliminated (or greatly reduced) the possibility of the
> code being reentered like it normally would be.

please note that this serialization of writes currently only takes place at the 
very last step when changes are stored. all other operations like constraint and 
node type validation, access right checks, check if the save is self contained, 
etc. can be done with multiple threads concurrently.

> The solution to all of these issues, maybe, is to use the standard JDBC
> DataSource interface to encapsulate the details of managing the JDBC
> connections.  If all of the current PM and FS implementations that use JDBC
> were refactored to have a DataSource member and to get and release
> connections inside of each method, then parity with the existing
> implementations could be achieved by providing a default DataSourceLookup
> strategy implemention that simply encapsulated the existing connection
> creation code (ignoring connection release requests).

I can't follow you here, how would that help?

> This would allow us
> (and others) to externally extend the implementation with alternative
> DataSourceLookup strategy implementations, say for accessing a datasource
>>from JNDI, or getting it from a Spring application context.  This solution
> also neatly externalizes all of the details of the actual datasource
> configuration from the repository.xml.

using JNDI is already possible and implementing a database persistence manager 
which obtains its connection from a spring context is fairly easy. you just have 
to implement the method DatabasePersistenceManager.getConnection() accordingly.


View raw message