jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: oak-api and move operations
Date Fri, 30 Mar 2012 12:31:27 GMT


On 30.3.12 13:18, Jukka Zitting wrote:
> Hi,
>
> On Fri, Mar 30, 2012 at 2:05 PM, Michael Dürig<mduerig@apache.org>  wrote:
>> I don't know about the details of the algorithm Git uses. But *if* that
>> algorithm *does* reconstruct move and copy operations from looking at the
>> raw trees, I'm pretty sure they annotated the trees in some ways to track
>> that information.
>
> Ultimately it's just a comparison of matching content, see the -M and
> -C options in git-diff(1). Git uses content hashes to optimize such
> comparisons.

Aha there it is! My original phrasing was: (emphasis added) "there is no 
way to *reliably* recover move and copy operations". Git gives up on 
reliability. No wonder since this is basically the tree homomorphism 
problem which is NP complete.

>
>> However, that doesn't solve the issue at hand. If we go down that route, why
>> should we bother at all and reconstruct the operations from the state only
>> to construct a JSOP statement to be given to Microkernel.commit()? We should
>> rather do away with this entirely and pass the new tree directly to the
>> Microkernel.
>
> Exactly. IMHO we should adjust the MK interface to support this. The
> solution should also address handling of large imports.

Ok ack. That way we'd circumvent above problem.

However, why do you favour state based approach over the transformation 
based approach? The latter seems much lighter while carrying more 
information and scales much better to big data.

Michael

>
>>> More generally, what's the use case where this functionality is
>>> needed? I'd be happy to even drop support for NODE_MOVED observation
>>> events unless we have real clients that actually require that
>>> information (as opposed to just create/update/delete events).
>>
>> I'd be fine with that. However, if clustering relies on journal scrapping,
>> this is not an option.
>
> Good point. Intuitively I'd be fine with placing such a restriction on
> the underlying clustering implementation, though it would be really
> nice if we had some performance/scalability figures to back that
> assumption.
>
> BR,
>
> Jukka Zitting

Mime
View raw message