jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Handling copies and moves with tree diffs
Date Tue, 27 Nov 2012 13:16:44 GMT


As mentioned in my previous message, here is a draft of an algorithm 
which should be capable of creating a change log from two trees, such 
that when the change log is applied to the source tree it will transform 
it to the destination tree. The algorithm is capable of coping with copy 
and move operations given that the source node of such operations is 

The algorithm relies on a couple of observations:

* Deletions of children needs to be handled before addition in order to 
avoid conflicting child names.

* Actual deletion needs to be deferred, since child nodes of a deleted 
node might still be be referred by a move operation which is only 
detected later in the differencing process. The algorithm does this by 
moving such nodes to a "trash" node which is removed again at the very 
end. This additional move operation might in the worst case cause each 
moved node to be moved twice leading to a (IMO negligible) blow up of 
the change log.

* Reconstructing a change log relies on incrementally applying detected 
changes to the source tree until it matches the destination tree. This 
is the place where tracking of source nodes of move and copy operations 
is crucial.

The pseudo code of the algorithm abstracts away from the current data 
structures we use in Oak and is only concerned with nodes. Its main 
purpose is to demonstrate the core of the algorithm so there might be 
still lots of room for improvements and optimisations. Also the tree 
structures used in the algorithm are mutable. However it should 
eventually easily to adapt these to NodeBuilders.


// Global variable holding the JSOP journal after the diffTree below 
jsop = ""

   Create a JSOP journal, which when applied to tree S will transform
   it to tree T.
diffTrees(S, T) {
   // Create a location (trash) which will temporarily hold removed nodes.
   // This is necessary since these (or child nodes thereof) might still be
   // needed in move operations occurring only later in the differencing 
   X = S.addNode(createUniqueName)

   // The main differencing process starts at the roots of the trees and
   // progresses recursively level by level.
   diffNodes(X, S, T)

   // Remove the trash location and all its content
   jsop += "-" + X.path

   Create JSOP operations for the differences of the immediate children
   of trees S and T. Tree X serves as trash.
diffNode(X, S, T) {
   deleted = S.childNames \ T.childNames   // set difference
   added   = T.childNames \ S.childNames

   // Need to take care of deleted nodes first in order to avoid
   // name clashes when adding new nodes later.
   for (d : deleted) {
     t = S.child(d)
     n = createUniqueName

     // Deleted nodes are moved to trash.
     t.moveTo(X, n) // Move node t to parent X with name n
     op = ">" + t.sourceNode.path + ":" + t.path
     jsop += op
     S.apply(op)               // Transform S according to the single op

   for (a : added) {
     t = T.child(a)

     // assume we can detect a copied node and know its source node
     if (isCopied(t)) {
       op = "*" + t.sourceNode.path + ":" + t.path

     // assume we can detect a moved node and know its source node
     else if (isMoved(t)) {
       op = ">" + t.sourceNode.path + ":" + t.path

     // this is an added node
     else {
       op = "+" + t.path

     jsop += op
     S.apply(op)               // Transform S according to the single op

   // assert S.childNames == T.childNames
   for (c : T.childNames) {
     diffNode(S.child(c), T.child(c))

On 27.11.12 10:56, Jukka Zitting wrote:
> Hi,
> As discussed at length earlier, in oak-core we only keep track of tree
> states instead of change logs, i.e. the sequences of changes that lead
> from one state to another. This works pretty well in general and
> avoids a lot of extra complexity, but poses the question of how to
> best produce the JSOP change log that the MicroKernel expects.
> The default case where content is simply added, modified or removed is
> easily handled by the existing JsopDiff class that maps a tree diff to
> the corresponding JSOP operations. The only big problem with this
> approach is that it doesn't handle bulk operations like copies and
> moves too well, instead it falls back to large add(+remove)
> operations. Currently we work around this problem by mapping each
> higher-level copy and move operation to separate branch commits that
> contain just that operation, which isn't too ideal.
> Michael already did quite a bit of work on this problem earlier and
> realized that it's essentially impossible to reliably extract all
> copies and moves from just a tree diff. He did come up with a pretty
> brilliant mechanism for handling that case when we do have the
> original change log, but I'd really like to avoid having to keep track
> of all changes.
> Thus I looked at how we could leverage the following two
> simplifications to the problem posed by a general tree diff:
> 1) For each tree node we keep track of its original location. For
> example in KernelNodeState we have a record of the path at which the
> node existed in the base revision.
> 2) We don't need optimal handling of all possible corner cases. As
> long as we cover basic copy and move use cases we can let the handling
> of trickier situations (like replacing a tree with one of its
> subtrees, or swapping two subtrees) degrade to the fallback mechanism
> we already have in JsopDiff. The lack of fully accurate move tracking
> may cause difficulty in merging concurrent changes over moves, but I
> think a few extra potential conflicts over non-trivial moves is a
> reasonable price to pay.
> I did some work along these lines already earlier with the
> CopyAndMoveAwareJsopDiff class you can find inside KernelRootBuilder,
> but couldn't make it work properly. Based on discussions with Michael
> last week I think we should be able to come up with an algorithm that
> works pretty well for this purpose. More on that in follow-ups.
> BR,
> Jukka Zitting

View raw message