Hi,
I just committed a proof of concept implementation of below algorithm to
my GitHub repository at https://github.com/mduerig/json-diff. For
maximal clarity this is a standalone implementation outside the Oak
framework.
Michael
On 27.11.12 13:16, Michael Dürig wrote:
>
> Hi,
>
> As mentioned in my previous message, here is a draft of an algorithm
> which should be capable of creating a change log from two trees, such
> that when the change log is applied to the source tree it will transform
> it to the destination tree. The algorithm is capable of coping with copy
> and move operations given that the source node of such operations is
> tracked.
>
> The algorithm relies on a couple of observations:
>
> * Deletions of children needs to be handled before addition in order to
> avoid conflicting child names.
>
> * Actual deletion needs to be deferred, since child nodes of a deleted
> node might still be be referred by a move operation which is only
> detected later in the differencing process. The algorithm does this by
> moving such nodes to a "trash" node which is removed again at the very
> end. This additional move operation might in the worst case cause each
> moved node to be moved twice leading to a (IMO negligible) blow up of
> the change log.
>
> * Reconstructing a change log relies on incrementally applying detected
> changes to the source tree until it matches the destination tree. This
> is the place where tracking of source nodes of move and copy operations
> is crucial.
>
>
> The pseudo code of the algorithm abstracts away from the current data
> structures we use in Oak and is only concerned with nodes. Its main
> purpose is to demonstrate the core of the algorithm so there might be
> still lots of room for improvements and optimisations. Also the tree
> structures used in the algorithm are mutable. However it should
> eventually easily to adapt these to NodeBuilders.
>
> Michael
>
> // Global variable holding the JSOP journal after the diffTree below
> returns.
> jsop = ""
>
> /*
> Create a JSOP journal, which when applied to tree S will transform
> it to tree T.
> */
> diffTrees(S, T) {
> // Create a location (trash) which will temporarily hold removed nodes.
> // This is necessary since these (or child nodes thereof) might still be
> // needed in move operations occurring only later in the differencing
> process.
> X = S.addNode(createUniqueName)
>
> // The main differencing process starts at the roots of the trees and
> // progresses recursively level by level.
> diffNodes(X, S, T)
>
> // Remove the trash location and all its content
> jsop += "-" + X.path
> }
>
> /*
> Create JSOP operations for the differences of the immediate children
> of trees S and T. Tree X serves as trash.
> */
> diffNode(X, S, T) {
> deleted = S.childNames \ T.childNames // set difference
> added = T.childNames \ S.childNames
>
> // Need to take care of deleted nodes first in order to avoid
> // name clashes when adding new nodes later.
> for (d : deleted) {
> t = S.child(d)
> n = createUniqueName
>
> // Deleted nodes are moved to trash.
> t.moveTo(X, n) // Move node t to parent X with name n
> op = ">" + t.sourceNode.path + ":" + t.path
> jsop += op
> S.apply(op) // Transform S according to the single op
> }
>
> for (a : added) {
> t = T.child(a)
>
> // assume we can detect a copied node and know its source node
> if (isCopied(t)) {
> op = "*" + t.sourceNode.path + ":" + t.path
> }
>
> // assume we can detect a moved node and know its source node
> else if (isMoved(t)) {
> op = ">" + t.sourceNode.path + ":" + t.path
> }
>
> // this is an added node
> else {
> op = "+" + t.path
> }
>
> jsop += op
> S.apply(op) // Transform S according to the single op
> }
>
> // assert S.childNames == T.childNames
> for (c : T.childNames) {
> diffNode(S.child(c), T.child(c))
> }
> }
>
>
>
>
> On 27.11.12 10:56, Jukka Zitting wrote:
>> Hi,
>>
>> As discussed at length earlier, in oak-core we only keep track of tree
>> states instead of change logs, i.e. the sequences of changes that lead
>> from one state to another. This works pretty well in general and
>> avoids a lot of extra complexity, but poses the question of how to
>> best produce the JSOP change log that the MicroKernel expects.
>>
>> The default case where content is simply added, modified or removed is
>> easily handled by the existing JsopDiff class that maps a tree diff to
>> the corresponding JSOP operations. The only big problem with this
>> approach is that it doesn't handle bulk operations like copies and
>> moves too well, instead it falls back to large add(+remove)
>> operations. Currently we work around this problem by mapping each
>> higher-level copy and move operation to separate branch commits that
>> contain just that operation, which isn't too ideal.
>>
>> Michael already did quite a bit of work on this problem earlier and
>> realized that it's essentially impossible to reliably extract all
>> copies and moves from just a tree diff. He did come up with a pretty
>> brilliant mechanism for handling that case when we do have the
>> original change log, but I'd really like to avoid having to keep track
>> of all changes.
>>
>> Thus I looked at how we could leverage the following two
>> simplifications to the problem posed by a general tree diff:
>>
>> 1) For each tree node we keep track of its original location. For
>> example in KernelNodeState we have a record of the path at which the
>> node existed in the base revision.
>>
>> 2) We don't need optimal handling of all possible corner cases. As
>> long as we cover basic copy and move use cases we can let the handling
>> of trickier situations (like replacing a tree with one of its
>> subtrees, or swapping two subtrees) degrade to the fallback mechanism
>> we already have in JsopDiff. The lack of fully accurate move tracking
>> may cause difficulty in merging concurrent changes over moves, but I
>> think a few extra potential conflicts over non-trivial moves is a
>> reasonable price to pay.
>>
>> I did some work along these lines already earlier with the
>> CopyAndMoveAwareJsopDiff class you can find inside KernelRootBuilder,
>> but couldn't make it work properly. Based on discussions with Michael
>> last week I think we should be able to come up with an algorithm that
>> works pretty well for this purpose. More on that in follow-ups.
>>
>> BR,
>>
>> Jukka Zitting
>>