lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-4560) Support Filtering Segments During Merge
Date Mon, 19 Nov 2012 07:48:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500073#comment-13500073
] 

Uwe Schindler edited comment on LUCENE-4560 at 11/19/12 7:48 AM:
-----------------------------------------------------------------

We had something similar in the past (called PayloadProcessor), which was removed completely
in 4.0 (without "replacement"). The reason was, that the stuff can be implemented inside a
FilterAtomicReader and used with IW#addIndexes(IndexReader...). I agree with Shai, that this
should be enough for most cases, especially as gradually merging segments can corrumpt your
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the "wrapping" should be done in the MP
and not on IndexWriter level (the number of settings in IWConfig is already too big). So the
main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like UpgradeIndexMergePolicy,
that wraps the AtomicReaders when creating the MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader, or leave as-is, if no upgrade
is needed.
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments (optionally one-by-one, so it will not merge all atomic
readers into one new segment)
- commit

Uwe
                
      was (Author: thetaphi):
    We had something similar in the past (called PayloadProcessor), which was removed completely
in 4.0 (without "replacement"). The reason was, that the stuff can be implemented inside a
FilterAtomicReader and used with IW#addIndexes(IndexReader...). I agree with Shai, that this
should be enough for most cases, especially as gradually merging segments can corrumpt your
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the "wrapping" should be done in the MP
and not on IndexWriter level (the number of settings in IWConfig is already too big). So the
main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like UpgradeIndexMergePolicy,
that wraps the AtomicReaders when creating the MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments
- start final maybeMerge()
- commit

Uwe
                  
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message