jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Detecting move operations in node state diffs
Date Mon, 21 Oct 2013 12:20:19 GMT

Hi,

I implemented a very rough POC of the algorithm outlined below. See [1] 
for the implementation itself. On move a node is annotated with its 
source path in NodeBuilder.moveTo(). Later moves can be extracted 
through the standalone MoveDetector class. See MoveDetectorTest for 
details. MoveDetector also provides a static utility method 
findMovedPaths for building the set of moved nodes the algorithm 
requires. As mentioned below this extra pass is not required if this set 
can be obtained by other means.

See [2] how this could be integrated with the current observation 
implementation.

If we deicide to go with such an approach at all, we still need to 
figure out how to better integrate it with the current node state diff.

Michael


[1] 
https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b18038eec11f315a
[2] 
https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e44c27ad34292b88


On 17.10.13 2:16 , Michael Dürig wrote:
>
> Hi,
>
> Currently we can't detect a move operation through diffing node states.
> Those operation are currently seen as separate remove and add operations
> that can't be easily associated with each other. This impacts permission
> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
> both don't have the same support for moves as had Jackrabbit 2.
>
> As discussed several times before it is not possible to regain move
> operation from simply diffing node states. We need additional
> information. One option is to annotate nodes (*) as they are moved with
> their source path. With that we could detect whether an added node was
> the target of a move operation and if so where the source of that
> operation was. However, this comes with a performance penalty since such
> a diff operation could not be done in a single pass any more. In order
> to decide whether a deleted node has been moved, the corresponding add
> needs to be found first. In essence this requires the diff operation to
> do two passes: the first one for detecting move operations and the
> second one for the other operations.
>
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*). This would allow us to look up whether a
> deleted node was moved (opposed to deleted) as we go and detect moved
> nodes as soon as we come across an added node that has a source path
> annotation. As an added benefit this approach allows us to detect
> whether there was a move at all simply by checking whether there are
> entries in this global place. If this is not the case, we could fall
> back to a simpler diff mechanism.
>
> (*) All such annotations would happen as hidden items in transient space
> and would have to be removed again by some hook before persisting.
>
> WDYT, is this worth the trouble?
>
> Michael

Mime
View raw message