lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge
Date Mon, 19 Nov 2012 15:34:58 GMT


Tim Smith commented on LUCENE-4560:

A migration strategy does exist and is very simple. It is up to the implementer to determine
how data will be migrated and properly communicate that to the user base so expectations are
set properly. All migration will have pros and cons, and my require gradual reindexing of
content to ensure consistency for old documents. but this is up to the implementer, and shouldn't
be imposed by the lucene apis.

Lets analyze the highlighting case based on indexed offsets.

Assume documents were indexed with no offsets.
Highlighting was being done for these documents using tokenstream based highlighting based
on stored field text.

Now, the user switches to using a more efficient offsets based highlighting.
new documents will be indexed with offsets.

Right now, assuming no merging was done, it is very easy to see if a document has indexed
offsets and on a per-document basis documents can be highlighted according to what was indexed.

Then a merge happens. (currently, this will force tokenstream based highlighting for all documents,
undoing the configuration setting)

If applying a migration policy, old documents can have 0,0 offsets applied. (this is the decision
of the migration policy and is up to the implementer of the migration policy)
Now, when highlighting is applied, if all positions have a 0,0 offset for a document, it can
fall back to tokenstream based highlighting.
if positions have offsets, it will use them to perform optimal, full-featured highlighting.

This will result in slightly slower highlighting for old documents.
user experience can then be improved by doing a gradual reindex of old documents, without
requiring user to blast away their existing index.

> Support Filtering Segments During Merge
> ---------------------------------------
>                 Key: LUCENE-4560
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are have different
options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating indexed data
to new index settings, support pruning/enhancing existing data gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message