jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: SegmentNodeStore merge operations
Date Thu, 07 Mar 2013 09:16:36 GMT

>However, as noted in OAK-633, there are a few conceptual problems with
>this approach to processing merges:
>a) Since validators and other commit hooks are not run during the
>merge, the result can be an internally inconsistent content tree
>(dangling references, incorrect permission store, etc.)
>b) The presence of conflict markers will prevent further changes to
>affected nodes until the conflict gets resolved
>c) There's no good way to handle more than one set of conflicts per node
>So, apart from problem a (which also affects the new MongoMK), the
>current mechanism works fine (i.e. fully parallel writes) as long as
>the changes are non-conflicting, but runs into trouble when there are

Sorry I don't understand, how does SegmentNodeStore merge affect the new
MongoMK? Please note I was taking about SegementNodeStore merge
operations, not MicroKernel.merge. The MongoMK doesn't merge segments and
journals, instead, conflicts are detected when committing on a node level
(relying on MongoDB features).

>* Use a more aggressive merge algorithm that automatically resolves
>all conflicts by throwing away (or storing somewhere else) "less
>important" changes when needed. Addresses problems b and c, problem a
>still an issue.

I'm worried our customers won't like this. It's very different from the
behaviour of regular databases (be it relational databases, or NoSQL
databases such as MongoDB). If it's a configurable for a certain subtree,
for improved performance, then it's acceptable in my view, but even then
I'm worried about the added complexity on the user/customer/developer
side. And I'm worried that if we need to enable it to get a scalable
solution, then it would turn people away.

In my view, SegmentNodeStore merging is somewhat similar to database
synchronization (as when synchronizing the smartphone calendar with the
desktop and so on). A long time ago, I was working on such a database
synchronization solution, called PointBase UniSync and MicroSync. A
hub-and-spoke model was used, and supported multiple types of conflicts
(insert/insert, update/update, update/delete, delete/update; delete/delete
was not treated as conflict for example). Multiple conflict resolution
algorithms were supported (spoke wins, hub wins, user defined using a
resolver callback). Interestingly, the documentation is still available at

As far as I know, NoSQL databases either try to avoid
merging/synchronization (MongoDB: writes always happen on the primary), or
do it in a very simple way. For example in Cassandra, if concurrent writes
are enabled, the latest change always wins:
http://www.datastax.com/docs/1.1/dml/about_writes "The latest timestamp
always wins".


View raw message