jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Handling copies and moves with tree diffs
Date Tue, 27 Nov 2012 11:18:13 GMT


On 27.11.12 10:56, Jukka Zitting wrote:
> Hi,
> As discussed at length earlier, in oak-core we only keep track of tree
> states instead of change logs, i.e. the sequences of changes that lead
> from one state to another. This works pretty well in general and
> avoids a lot of extra complexity, but poses the question of how to
> best produce the JSOP change log that the MicroKernel expects.
> The default case where content is simply added, modified or removed is
> easily handled by the existing JsopDiff class that maps a tree diff to
> the corresponding JSOP operations. The only big problem with this
> approach is that it doesn't handle bulk operations like copies and
> moves too well, instead it falls back to large add(+remove)
> operations. Currently we work around this problem by mapping each
> higher-level copy and move operation to separate branch commits that
> contain just that operation, which isn't too ideal.
> Michael already did quite a bit of work on this problem earlier and
> realized that it's essentially impossible to reliably extract all
> copies and moves from just a tree diff. He did come up with a pretty
> brilliant mechanism for handling that case when we do have the
> original change log, but I'd really like to avoid having to keep track
> of all changes.

Thanks Jukka for picking up the ball! Our private chat from last week 
triggered some new ideas on my side. Basically I

* reviewed the preconditions: what information do we have or could we 
make available? How does this differ from the preconditions I based my 
original work on change log on [1, 2]?

* Applied the knowledge gathered while working on my change log based 
solution upon the new situation.

> Thus I looked at how we could leverage the following two
> simplifications to the problem posed by a general tree diff:
> 1) For each tree node we keep track of its original location. For
> example in KernelNodeState we have a record of the path at which the
> node existed in the base revision.

Right, this matches my observation. However, I think recording the path 
is not sufficient. We need to record the parent node.

> 2) We don't need optimal handling of all possible corner cases. As
> long as we cover basic copy and move use cases we can let the handling
> of trickier situations (like replacing a tree with one of its
> subtrees, or swapping two subtrees) degrade to the fallback mechanism
> we already have in JsopDiff. The lack of fully accurate move tracking
> may cause difficulty in merging concurrent changes over moves, but I
> think a few extra potential conflicts over non-trivial moves is a
> reasonable price to pay.

If we record the original parent node for all moved/copied nodes, I 
think it is possible to come up with a change log which is not larger 
than twice the size of a minimal change log. The tricky part here is to 
find out in what order to execute copy and move operations.

In general the size of the change log should be acceptable and for most 
common cases I think it will be even minimal. If we really think we need 
to squeeze the change logs down to minimal, there is sill my initial 
change log consolidator [1, 2].

> I did some work along these lines already earlier with the
> CopyAndMoveAwareJsopDiff class you can find inside KernelRootBuilder,
> but couldn't make it work properly. Based on discussions with Michael
> last week I think we should be able to come up with an algorithm that
> works pretty well for this purpose. More on that in follow-ups.

I'll also sketch my idea an follow up with a separate message.


[1] http://markmail.org/message/3c22jqy6gzzkyxx2
> BR,
> Jukka Zitting

View raw message