lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: MultiFields#getTerms docs clarification
Date Wed, 31 Aug 2016 04:33:18 GMT
On Mon, Aug 29, 2016 at 8:23 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Seems like you need to scrutinize exactly what documents were indexed in step 3?
>
> How exactly did you copy documents out of the old index?  Note that
> when Lucene's IndexReader returns a Document, it's not the same
> Document that was indexed in the first place: it will only have fields
> that were stored, and it does not store certain metadata about how
> those field values were indexed.  But I don't see how that alone can
> lead to indexing an empty string token.

The root cause is that, apparently, in some older version, we *did*
index an empty field, which at some point later had already been fixed
by someone else. I verified that this empty field was in fact present
in the stored fields for the document before the index was migrated to
Lucene 5.

So the only obvious difference then is between Lucene 3 indexing no
tokens for this field, and Lucene 5 indexing a single empty token?

I have ended up putting in a migration to delete the spurious empty
term in the postings as well as deleting the empty field from all the
documents where it's present.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message