jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: commit hooks and indexing
Date Fri, 15 Feb 2013 10:05:06 GMT
Hi,

To update the external index: the hooks could be used to remember what
content may needs to be indexed. That is:

(A) The index hook stores the path of the added/removed/changed nodes
(only for those nodes that contain indexed data) in a separate
'toBeIndexed' node, in the form of a tree. This is done synchronously with
the commits.

(B) The index updater (an asynchronous thread) reads this 'toBeIndexed'
node from time to time, to check if anything needs to be done. If yes, it
checks the content (the newest revision for added data, the old revision
for removed data). Once this is done, the 'toBeIndexed' node is cleaned.

This is like using the JCR EventJournal (if we want to support that), but
(A) can better filter the events that needs to be stored.

Regards,
Thomas


On 2/15/13 8:11 AM, "Jukka Zitting" <jukka.zitting@gmail.com> wrote:

>Hi,
>
>On Thu, Feb 14, 2013 at 10:29 PM, Michael Dürig <mduerig@apache.org>
>wrote:
>> However, there is a difference depending on whether the index is stored
>>in
>> content or external. For the former case using commit hooks is the right
>> thing to do. In the case of a failed commits nothing is written at all,
>>not
>> even the index data. Using and observer here still works, but would
>>leave
>> the index lagging behind for the time the commit actually succeeded
>>until
>> the observer is finally called.
>
>I think both options are valid for an in-content index, the basic
>trade-off here is between commit speed and conflict handling on the
>other hand and instant availability of index updates on the other.
>
>A hook-based index is by definition always up to date with latest
>content, and is thus useful especially for things like UUID tables and
>other internal indices that need to be kept up to date at all times.
>
>However, hooks add overhead to each individual commit and will either
>need automatic conflict resolution or synchronous execution to avoid
>index corruption in cases of concurrent commits. That makes them
>non-ideal for many of the more complex types of indices.
>
>Luckily most of the potential complex indices don't need to be up to
>date at all times, and thus can well be updated via an observer even
>if the index content is stored in the repository. In such cases the
>observer treats the repository like any other external index storage
>(i.e. it's not updated through the Observer interface like how hooks
>work), and would just need to make sure to ignore the content updates
>it itself makes.
>
>> For externally stored indexes I think we need to live with the lag in
>>favour
>> of having a consistent index.
>
>Right; without implementing full distributed transaction support (and
>the associated concurrency overhead) it's impossible to keep an
>external index in sync with the repository at all times.
>
>BR,
>
>Jukka Zitting


Mime
View raw message