jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject Re: [jr3] EventJournal / who merges changes
Date Thu, 25 Feb 2010 13:38:48 GMT
There are "low level merge" and "high level merge". A "low level
merge" is problematic it can result in unexpected behavior. I would
even say the way Jackrabbit merges changes currently (by looking at
the data itself, not at the operations) is problematic.

Example: Currently, orderBefore can not be done at the same time as
addNode or another orderBefore. I'm not saying this is important, but
it's one case that is complex. Another example: Let's say the low
level representation would split nodes if here are more than 1000
child nodes (add one layer of hidden internal nodes). That means
adding a node to a list of 1000 nodes could cause a (b-tree-) split.
If two sessions do that concurrently it will get messy. Session 1 will
create new internal nodes, session 2 will create new internal nodes as
well (but different ones), and merging the result will (probably)
duplicate all 1000 nodes. Or worse.

The idea is to _not_ try to merge by looking at the data, but merge by
re-applying the operation. If saving the new data fails (by looking at
the timestamp/version numbers), then refresh the data, and re-apply
the operation ("orderBefore", "addNode",...). This is relatively easy
to implement, and works in more cases than what Jackrabbit can do now.
Jackrabbit anyway needs to keep the EventJournal, so this is will not
use more memory.

This is not a new idea, it is how MVCC works (at least how I
understand it). From
http://en.wikipedia.org/wiki/Multiversion_concurrency_control  - "if a
transaction [fails], the transaction ... is aborted and restarted."


View raw message