lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge
Date Sun, 18 Nov 2012 18:42:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499871#comment-13499871
] 

Shai Erera commented on LUCENE-4560:
------------------------------------

Hi Tim. While in general I'm not against the idea (and I think that in general have some more
control during the merge stage is needed), may I ask why can't you e.g. do this code (borrowing
from your patch):

{code}
IndexWriter writer = new IndexWriter(newDirectory);
writer.addIndexes(new RemoveFieldReader(oldReader));
{code}

That will accomplish, I believe, exactly what you want, no?

The benefits to your approach is that the filtering is done in-place, i.e. no need to add
to a new directory, then switch old/new dirs. But it also may inadvertently add bugs, e.g.
if someone mistakenly decided to remove a field, or worse, removes the wrong field ... w/
the addIndexes approach, you can do the process offline, investigate the result index and
once you're contend with it, make the switch.

I can see the benefits in both approaches, but I think that the addIndexes approach is safer,
as it's not 'online' and does not change the source directory. I'm not sure how 'online' this
process needs to be though. How often do you remove fields, or change index options? That's
a fairly serious decision IMO, and should be done w/ care and lots of testing. Doing that
in-place may be dangerous.

About the patch, it's very simple and clean, which is a good thing ! I would make RemoveFieldReader
extend FilterAtomicReader, to save you some lines of code, even though it's just a test class.

If you do (and others agree) want to continue w/ the online filtering approach, perhaps, instead
of introducing a MergedSegmentFilter, we could make SegmentMerger pluggable, with few extension
points that allow you to allocate your own AtomicReader ... just a thought, I know it's not
directly related to this issue, but if we're going to open segment merging up for some serious
hacking, let's do it w/ all intentions :).
                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message