jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: [jr3] EventJournal / who merges changes
Date Fri, 26 Feb 2010 12:12:09 GMT
Sorry for top posting, I am not certain where to put this "request".

Currently adding child nodes is almost serialised since its not possible to merge concurrent
changes in a single multi valued property.
*If* MVCC with abort on conflict is going to make this situation worse, then that IMHO would
be a mistake.
If however the the probability of conflict when updating a multivalued property is reduced
then that would be good. (ie giving certain properties a different storage layout that avoided
conflicts, I think you elude to this)

eg
At the moment, (JR16) when adding users to our jackrabbit (Sling) based system, we have to
do this single threaded to avoid conflicts, since even with 3 threads, conflicts are far too
common. To reduce contention put the new nodes sharded tree (eg .../ff/ff/ff/ff/user_node
), but we still get lots of contention, estimated at 1 in 20 operations for the first 20K
users, worse at the start. (btw, num of users ranges 10K ->4M).

Ian
On 25 Feb 2010, at 13:38, Thomas Müller wrote:

> There are "low level merge" and "high level merge". A "low level
> merge" is problematic it can result in unexpected behavior. I would
> even say the way Jackrabbit merges changes currently (by looking at
> the data itself, not at the operations) is problematic.
> 
> Example: Currently, orderBefore can not be done at the same time as
> addNode or another orderBefore. I'm not saying this is important, but
> it's one case that is complex. Another example: Let's say the low
> level representation would split nodes if here are more than 1000
> child nodes (add one layer of hidden internal nodes). That means
> adding a node to a list of 1000 nodes could cause a (b-tree-) split.
> If two sessions do that concurrently it will get messy. Session 1 will
> create new internal nodes, session 2 will create new internal nodes as
> well (but different ones), and merging the result will (probably)
> duplicate all 1000 nodes. Or worse.
> 
> The idea is to _not_ try to merge by looking at the data, but merge by
> re-applying the operation. If saving the new data fails (by looking at
> the timestamp/version numbers), then refresh the data, and re-apply
> the operation ("orderBefore", "addNode",...). This is relatively easy
> to implement, and works in more cases than what Jackrabbit can do now.
> Jackrabbit anyway needs to keep the EventJournal, so this is will not
> use more memory.
> 
> This is not a new idea, it is how MVCC works (at least how I
> understand it). From
> http://en.wikipedia.org/wiki/Multiversion_concurrency_control  - "if a
> transaction [fails], the transaction ... is aborted and restarted."
> 
> Regards,
> Thomas


Mime
View raw message