lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge
Date Sun, 18 Nov 2012 19:02:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499876#comment-13499876
] 

Shai Erera commented on LUCENE-4560:
------------------------------------

Thinking about this some more, I really don't thing it's a 'gradual' thing that you do to
the index:

* Depending on the state of the index, this migration may not happen at all to some segments,
typically very large segments and are not picked for merge anymore. So what will happen is
that you'll have code in your app that will never be invoked after some time ... not a good
sign to me.

* I won't want to have code in my app that lives there forever. Rather, I'd like to make a
decision to remove field 'foo', run the process which removes it once, and be done with it,
moving the code to some "tools" area that is never run again.
** With your approach, RemoveFieldReader will not go away, unless you can guarantee it ran
on all segments, which is like forcing forceMerge(1) to run (note, it may not do what you
want, per MP settings !), which is really like addIndexes
** Worse, today it's RemoveFieldReader, and tomorrow it will turn into RemoveFieldAndMigrateIndexOptionsReader,
because as I wrote above, you cannot stop running that code if you cannot ensure that all
segments have been migrated.

So I'm beginning to think that this process should not be an incremental/gradual/online thing,
but rather an addIndexes type of process, that you run once, and know that you're done with
it, until the next time where you need to rewrite the index, w/o actually re-indexing the
content.

BTW, did you take a look at LUCENE-2632? It is about adding a FilteringCodec which filters
the data that it writes/reads. Could it help you here? If so, I think that it has better chances
to get committed, than the approach in this issue (Codecs are already an extension point...).
                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message