jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: How to avoid nodes to be indexed
Date Thu, 06 Mar 2014 14:46:59 GMT

On Thu, Mar 6, 2014 at 9:32 AM, Tommaso Teofili
<tommaso.teofili@gmail.com> wrote:
> for my Solr (indexing) resiliency use case [1] I've implemented an
> extension of Solr client which is able to cache requests if Solr goes down
> and execute them back once the Solr instance comes back.
> Now if the repository goes down during the Solr downtime we loose the
> cached requests as they live in memory so we could write such queued
> requests down as nodes in the repository for persisting them and eventually
> fetch them once Solr comes live again, but then they may get indexed and
> that would lead to a loop.
> So I wonder if there's any way we can tell the repository we don't want
> some nodes (based on e.g. primaryType and/or path and/or property
> existing/missing) to be indexed, whatever an IndexEditor is supposed to do.

Sounds like an XY problem:

X: Ensuring that he Solr index is (eventually) consistent with content
in the repository even if the Solr server is down at times.
Y: Exclude certain nodes from being indexed.

There's a much easier solution to X:

The async indexer mechanism keeps track of the last repository
checkpoint that has been indexed. If you throw an exception during
indexing if the Solr server goes down, then the latest checkpoint
won't be marked as indexed, and the next iteration of the async
indexer will restart from the previous checkpoint, resulting in
recreation of all the potentially failed Solr indexing requests.


Jukka Zitting

View raw message