jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: Asynchronous indexing consistency
Date Wed, 29 May 2013 12:01:36 GMT

>There could be various reasons for why an indexer might not be
>available for an extended amount of time

Possibly you are right, but let me try to challenge this assumption:

Wouldn't it be a problem if the index isn't updated for a long time? Don't
we need a protection against an outdated index? Unless each cluster node
keeps its own index, but I guess we should try to avoid that, for the sake
of scalability.

(I see there are other uses for an EventJournal, but it's easier if we
talk about concrete use cases).

>a) If, like in the Segment and H2 MKs, we could rely on the MKs
>supporting cheap copies and diffs across subtrees, we could implement
>this without API changes by keeping a copy of the last indexed/seen
>state of the repository in a hidden subtree. The indexer would refresh
>this copy on each index update, and could thus always know what
>content has already been indexed. Unfortunately there probably isn't
>any easy way to do this in the MongoMK.

It sounds like reading with old revisions. We would need a mechanism yet
to tell the MongoMK which revisions to keep; so far we have assumed a
fixed garbage collection limit (actually we didn't define that yet). The
MongoMK in fact it does currently keep all old revisions, because garbage
collection isn't implemented yet :-)

I see the difference to your approach is that intermediate revisions could
be garbage collected before, but that might be an unnecessary optimization.


View raw message