lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge
Date Mon, 19 Nov 2012 07:38:59 GMT


Uwe Schindler commented on LUCENE-4560:

We had something similar in the past (called PayloadProcessor), which was removed completely
in 4.0 (without "replacement"). The reason was, that the stuff can be implemented inside a
FilterAtomicReader and used with IW#addIndexes(IndexReader...). I agree with Shai, that this
should be enough for most cases, especially as gradually merging segments can corrumpt your
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the "wrapping" should be done in the MP
and not on IndexWriter level (the number of settings in IWConfig is already too big). So the
main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like UpgradeIndexMergePolicy,
that wraps the AtomicReaders when creating the MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- addIndexes the filtered segments
- delete the old segments manually (e.g. by deleting all documents)
- start final maybeMerge()
- commit

> Support Filtering Segments During Merge
> ---------------------------------------
>                 Key: LUCENE-4560
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message