jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: Detecting move operations in node state diffs
Date Mon, 21 Oct 2013 14:31:49 GMT

> extra pass

On how to avoid this extra pass. Not strictly backward compatible, but I
wonder how much it would break: what if observation would deliver two
events for moved nodes: the "node moved" event (added at the target), plus
the "node deleted" event (deleted at the source)? The one use case I know
about, data store garbage collection in Jackrabbit core, would be OK with
this behavior.


On 10/21/13 2:20 PM, "Michael Dürig" <mduerig@apache.org> wrote:

>I implemented a very rough POC of the algorithm outlined below. See [1]
>for the implementation itself. On move a node is annotated with its
>source path in NodeBuilder.moveTo(). Later moves can be extracted
>through the standalone MoveDetector class. See MoveDetectorTest for
>details. MoveDetector also provides a static utility method
>findMovedPaths for building the set of moved nodes the algorithm
>requires. As mentioned below this extra pass is not required if this set
>can be obtained by other means.
>See [2] how this could be integrated with the current observation
>If we deicide to go with such an approach at all, we still need to
>figure out how to better integrate it with the current node state diff.
>On 17.10.13 2:16 , Michael Dürig wrote:
>> Hi,
>> Currently we can't detect a move operation through diffing node states.
>> Those operation are currently seen as separate remove and add operations
>> that can't be easily associated with each other. This impacts permission
>> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>> both don't have the same support for moves as had Jackrabbit 2.
>> As discussed several times before it is not possible to regain move
>> operation from simply diffing node states. We need additional
>> information. One option is to annotate nodes (*) as they are moved with
>> their source path. With that we could detect whether an added node was
>> the target of a move operation and if so where the source of that
>> operation was. However, this comes with a performance penalty since such
>> a diff operation could not be done in a single pass any more. In order
>> to decide whether a deleted node has been moved, the corresponding add
>> needs to be found first. In essence this requires the diff operation to
>> do two passes: the first one for detecting move operations and the
>> second one for the other operations.
>> To avoid the second pass, we could also remember the paths of the moved
>> nodes in a global place (*). This would allow us to look up whether a
>> deleted node was moved (opposed to deleted) as we go and detect moved
>> nodes as soon as we come across an added node that has a source path
>> annotation. As an added benefit this approach allows us to detect
>> whether there was a move at all simply by checking whether there are
>> entries in this global place. If this is not the case, we could fall
>> back to a simpler diff mechanism.
>> (*) All such annotations would happen as hidden items in transient space
>> and would have to be removed again by some hook before persisting.
>> WDYT, is this worth the trouble?
>> Michael

View raw message