jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mic...@gmail.com>
Subject Re: Handling copies and moves with tree diffs
Date Tue, 27 Nov 2012 14:03:25 GMT

Hi,

to track this more easily I committed the pseudo code to svn: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/doc/jsop-diff.md?view=markup

A small correction to my initial post is here: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/doc/jsop-diff.md?r1=1414189&r2=1414190&pathrev=1414190&view=diff&diff_format=f

Michael


On 27.11.12 13:16, Michael Dürig wrote:
>
> Hi,
>
> As mentioned in my previous message, here is a draft of an algorithm
> which should be capable of creating a change log from two trees, such
> that when the change log is applied to the source tree it will transform
> it to the destination tree. The algorithm is capable of coping with copy
> and move operations given that the source node of such operations is
> tracked.
>
> The algorithm relies on a couple of observations:
>
> * Deletions of children needs to be handled before addition in order to
> avoid conflicting child names.
>
> * Actual deletion needs to be deferred, since child nodes of a deleted
> node might still be be referred by a move operation which is only
> detected later in the differencing process. The algorithm does this by
> moving such nodes to a "trash" node which is removed again at the very
> end. This additional move operation might in the worst case cause each
> moved node to be moved twice leading to a (IMO negligible) blow up of
> the change log.
>
> * Reconstructing a change log relies on incrementally applying detected
> changes to the source tree until it matches the destination tree. This
> is the place where tracking of source nodes of move and copy operations
> is crucial.
>
>
> The pseudo code of the algorithm abstracts away from the current data
> structures we use in Oak and is only concerned with nodes. Its main
> purpose is to demonstrate the core of the algorithm so there might be
> still lots of room for improvements and optimisations. Also the tree
> structures used in the algorithm are mutable. However it should
> eventually easily to adapt these to NodeBuilders.
>
> Michael
>
> // Global variable holding the JSOP journal after the diffTree below
> returns.
> jsop = ""
>
> /*
>    Create a JSOP journal, which when applied to tree S will transform
>    it to tree T.
> */
> diffTrees(S, T) {
>    // Create a location (trash) which will temporarily hold removed nodes.
>    // This is necessary since these (or child nodes thereof) might still be
>    // needed in move operations occurring only later in the differencing
> process.
>    X = S.addNode(createUniqueName)
>
>    // The main differencing process starts at the roots of the trees and
>    // progresses recursively level by level.
>    diffNodes(X, S, T)
>
>    // Remove the trash location and all its content
>    jsop += "-" + X.path
> }
>
> /*
>    Create JSOP operations for the differences of the immediate children
>    of trees S and T. Tree X serves as trash.
> */
> diffNode(X, S, T) {
>    deleted = S.childNames \ T.childNames   // set difference
>    added   = T.childNames \ S.childNames
>
>    // Need to take care of deleted nodes first in order to avoid
>    // name clashes when adding new nodes later.
>    for (d : deleted) {
>      t = S.child(d)
>      n = createUniqueName
>
>      // Deleted nodes are moved to trash.
>      t.moveTo(X, n) // Move node t to parent X with name n
>      op = ">" + t.sourceNode.path + ":" + t.path
>      jsop += op
>      S.apply(op)               // Transform S according to the single op
>    }
>
>    for (a : added) {
>      t = T.child(a)
>
>      // assume we can detect a copied node and know its source node
>      if (isCopied(t)) {
>        op = "*" + t.sourceNode.path + ":" + t.path
>      }
>
>      // assume we can detect a moved node and know its source node
>      else if (isMoved(t)) {
>        op = ">" + t.sourceNode.path + ":" + t.path
>      }
>
>      // this is an added node
>      else {
>        op = "+" + t.path
>      }
>
>      jsop += op
>      S.apply(op)               // Transform S according to the single op
>    }
>
>    // assert S.childNames == T.childNames
>    for (c : T.childNames) {
>      diffNode(S.child(c), T.child(c))
>    }
> }
>
>
>
>
> On 27.11.12 10:56, Jukka Zitting wrote:
>> Hi,
>>
>> As discussed at length earlier, in oak-core we only keep track of tree
>> states instead of change logs, i.e. the sequences of changes that lead
>> from one state to another. This works pretty well in general and
>> avoids a lot of extra complexity, but poses the question of how to
>> best produce the JSOP change log that the MicroKernel expects.
>>
>> The default case where content is simply added, modified or removed is
>> easily handled by the existing JsopDiff class that maps a tree diff to
>> the corresponding JSOP operations. The only big problem with this
>> approach is that it doesn't handle bulk operations like copies and
>> moves too well, instead it falls back to large add(+remove)
>> operations. Currently we work around this problem by mapping each
>> higher-level copy and move operation to separate branch commits that
>> contain just that operation, which isn't too ideal.
>>
>> Michael already did quite a bit of work on this problem earlier and
>> realized that it's essentially impossible to reliably extract all
>> copies and moves from just a tree diff. He did come up with a pretty
>> brilliant mechanism for handling that case when we do have the
>> original change log, but I'd really like to avoid having to keep track
>> of all changes.
>>
>> Thus I looked at how we could leverage the following two
>> simplifications to the problem posed by a general tree diff:
>>
>> 1) For each tree node we keep track of its original location. For
>> example in KernelNodeState we have a record of the path at which the
>> node existed in the base revision.
>>
>> 2) We don't need optimal handling of all possible corner cases. As
>> long as we cover basic copy and move use cases we can let the handling
>> of trickier situations (like replacing a tree with one of its
>> subtrees, or swapping two subtrees) degrade to the fallback mechanism
>> we already have in JsopDiff. The lack of fully accurate move tracking
>> may cause difficulty in merging concurrent changes over moves, but I
>> think a few extra potential conflicts over non-trivial moves is a
>> reasonable price to pay.
>>
>> I did some work along these lines already earlier with the
>> CopyAndMoveAwareJsopDiff class you can find inside KernelRootBuilder,
>> but couldn't make it work properly. Based on discussions with Michael
>> last week I think we should be able to come up with an algorithm that
>> works pretty well for this purpose. More on that in follow-ups.
>>
>> BR,
>>
>> Jukka Zitting
>>

Mime
View raw message