lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge
Date Mon, 19 Nov 2012 14:46:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500273#comment-13500273
] 

Uwe Schindler commented on LUCENE-4560:
---------------------------------------

bq. Off hand, it looks like this may still require a patch as the SegmentMerger is currently
only aware of SegmentReaders from merging,

This is not true, you can merge any atomic reader! It may have some optimizations for SegmentReaders,
but generally any type of atomic reader can merge into an index (e.g. with addIndexes(IndexReader...)
-> which is the proposal by Shai and myself)

bq. Also, i argue that any addIndexes() approach is even more dangerous and just as prone
to corruption.
This can result in the same filtering of readers as the attached patch provides, however it
modifies the entire index, thereby causing any corruption to be much more widespread. (of
course either way, it is up to the person implementing their custom filter to guarantee that
no corruption occurs and that their code produces consistent indexes)

Read my comment carefully: You can just trigger a merge of segments that you really want to
change. The code would look like:
- open IndexWriter
- get NRT reader and get its atomic leaves: DirectoryReader.open(IndexWriter).leaves()
- filter all leaves, that you are interested in (e.g. my investigating the metadata and version
numbers from the leaves' SegmentInfo; assuming they are SegmentReaders -> instanceof check)
- wrap all leaves that you want to change with your custom filter
- delete all documents by using IndexWriter.deleteAll()
- use addIndexes and pass you list of partially wrapped atomic leaves.
- commit

This will trigger something like a forceMerge(1), your resulting index will have one segment
(it is optimized). This approach is as heavy as your merge approach, because in your do-it-on-merge
you have to at least forcefully merge all segments to upgrade your index (e.g. call forceMerge(1)).
                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message