lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4557) Indexed Offsets Can Be Lost During Merge
Date Wed, 14 Nov 2012 16:26:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497195#comment-13497195
] 

Robert Muir commented on LUCENE-4557:
-------------------------------------

Seriously, the behavior is no different here than omitTF has been throughout past releases.

The only thing I don't like is that indexwriter doesn't throw an exception if you try to add
a field
with incompatible indexing properties (e.g. you try to turn on offsets when they are already
off,
or you try to add norms when they are off, or you try to index with positions when you previously
omitted TF).

Adding fake data is out of the question: if you want to populate your index with bogus offsets,
then make a BogusOffsetsFilterReader, call addIndexes, and rewrite your postings with this
bogus data.

Then run checkIndex: we are pretty picky about what the offset values can be.

                
> Indexed Offsets Can Be Lost During Merge
> ----------------------------------------
>
>                 Key: LUCENE-4557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4557
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Tim Smith
>         Attachments: OffsetsTest.java
>
>
> Primary Use case:
> Start with pre-4.0 index (no indexed offsets available)
> Start indexing new documents with indexed offsets (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
previously was IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
> merge/optimize index
> newly indexed documents will now no longer have offsets available
> In general, it is impossible to ever change a field to have offsets indexed when starting
with an existing index as a merge will cause offsets to be removed from the index.
> Desirable behavior would be for new documents to have offsets indexed properly, and old
documents would have offset of "0, 0" for all positions after merging with a segment that
contains offsets
> Current behavior can be very dangerous.
> for example:
> * Start indexing documents with indexed offsets
> * change config to not index offsets by accident
> * index 1 document
> * revert config back
> * offsets will start disappearing from documents as segments are merged

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message