jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller" <thomas.tom.muel...@gmail.com>
Subject Re: Next Generation Persistence
Date Thu, 14 Jun 2007 16:16:45 GMT

I like to better understand the reasons for NGP. I found the following
issues in JIRA, but I think most of those problems can be solved even
without NGP. Are there any other issues to consider (and issues
without JIRA entry)?

Allow concurrent writes on the PM. The root problem seems to be:
storing large binary objects blocks others?

Global data store for binaries (stream large objects early without
blocking others)

Multiple connections problem / Versioning operations.
Could be solved by using the same connection for versioning.

Versioning operations are not fully transactional.
Could be solved by using the same connection for versioning.

Change resources sequence during transaction commit.
Could be solved by using the same connection for versioning.

Concurrent read-only access to a session
Unrelated (multiple threads in one session, I would use synchronize)

Handling of binary properties (streams) in QValue interface: unrelated
to this discussion, SPI specific

I didn't find an open issue for: The search index is updated outside
of transactions. This doesn't feel right (I like consistency), but in
practice this is not a problem as long as all saved objects are in the
index: the query engine filters non-existing results. Is this correct?

What do you think about using the same connection for versioning and
regular access? I know it requires refactoring, and a new setting in
repository.xml. Anything else?

I found some more information about MVCC. It looks like PostgreSQL,
Oracle, and newer versions of MS-SQL Server work like this:

- Reading: read the 'base revision of the session' (writers don't block readers)
- Writing: lock the node for other writers, creates a new 'version'

Using write locks avoids the following problem:

- Session A starts a transaction, updates Node 1 (x=4)
- Session B starts a transaction, updates Node 1 (x=5), commits (saves)
- Session A does some more work, tries to commit -> Exception

Theoretically, session A should catch the exception and retry. But
many applications expect it to work (it works now). Also, retrying
will not work if the transaction is long and Node 1 is updated a lot
by other sessions (let's say it a counter). That's why I would use
locks for writes. MVCC is used for reading, so readers don't block
writers (like they do now?), resulting in good concurrency for most

Explicit write locks: Sometimes an application doesn't need to update
a node but wants to ensure it's not updated by somebody else. This
feature is not that important; in databases, this is SELECT ... FOR
UPDATE, and most people don't really need it. This case is not
documented in the JCR API specs, but Jackrabbit could add a write lock
when calling Item.save() (even when no changes are made).


P.S. If somebody wants to cross-post it to Lucene and Derby, feel
free. I think the requirements of Lucene and Derby are different, but
I might be wrong.

View raw message