lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4557) Indexed Offsets Can Be Lost During Merge
Date Wed, 14 Nov 2012 17:54:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497278#comment-13497278
] 

Tim Smith commented on LUCENE-4557:
-----------------------------------

i understand your aversion to what i suggest, however i still argue this is a pretty nasty
bug given that indexed content is lost

i also argue that it should be fully supported to change settings on fields as time goes on,
especially the ability to make the field more general (add positions/offsets/insertnewfeaturehere).
Old data would of course be limited to the settings the data was indexed with. However, new
content should not be restricted to old settings.

Without supporting this, you are forcing full reindexes in situations that really should not
require it.  This is a big red flag in my opinion.


from what i understand of your FilterReader suggestion, it would require me to do the equivalent
of an index optimize in order to "upgrade/convert" the index to the have (0,0) offsets on
segments that were lacking this setting?

This seems extremely expensive, and would require me to detect this situation at index startup
time, and then spend very large amounts of time performing the conversion all blocking indexing
from continuing until this operation is over.

Controlling this behavior at merge time seems to be the appropriate place.
As long as i could control the merge behavior via a pluggable/configurable API i would be
happy, and  any other users that encounter this issue would also have a means to address it.
Looks like merging of segments data is not exposed at all, so right now there is no way to
handle this situation properly.

For instance, if i could wrap the SegmentReader at merge time to provide null offsets that
would be fine. Ideally, there would be some means to still support efficient bulk merging
of stored fields/term vectors etc.






                
> Indexed Offsets Can Be Lost During Merge
> ----------------------------------------
>
>                 Key: LUCENE-4557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4557
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Tim Smith
>         Attachments: OffsetsTest.java
>
>
> Primary Use case:
> Start with pre-4.0 index (no indexed offsets available)
> Start indexing new documents with indexed offsets (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
previously was IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
> merge/optimize index
> newly indexed documents will now no longer have offsets available
> In general, it is impossible to ever change a field to have offsets indexed when starting
with an existing index as a merge will cause offsets to be removed from the index.
> Desirable behavior would be for new documents to have offsets indexed properly, and old
documents would have offset of "0, 0" for all positions after merging with a segment that
contains offsets
> Current behavior can be very dangerous.
> for example:
> * Start indexing documents with indexed offsets
> * change config to not index offsets by accident
> * index 1 document
> * revert config back
> * offsets will start disappearing from documents as segments are merged

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message