lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: Dealing with index format changes
Date Sun, 29 Jan 2017 19:31:30 GMT
As far as I remember, this is a Luke related display bug (the nulls in output on position increments
greater than 1).

The phrase query question is unrelated to the display bug. This is documented in the migration
guide. 

The problem is that old indexes can't play with new phrase query and query parser behaviour.
So you have to switch off the position increments in query parser.

But if you reindex your stuff with a correct working stop filter of Lucene 4 you will also
have working phrases with increments.

Uwe

Am 29. Januar 2017 19:46:02 MEZ schrieb Adrien Grand <jpountz@gmail.com>:
>Hi Chris, this "null_1" token is unexpected to me. Did you reindex with
>Lucene 4.10.3 or just upgrade to the new file format using merging? Can
>you
>also share your analysis chain?
>
>Le sam. 28 janv. 2017 à 12:22, Chris Bamford
><chris@chrisbamford.plus.com>
>a écrit :
>
>> Hello
>>
>> I am in the process of moving from indexing with 3.6.0 to 4.10.3
>(albeit
>> in 3.6.0 compatibility mode). Examination of the resulting indexes
>with
>> Luke shows that text fields now contain null markers where stop words
>have
>> been removed whereas the previous indexes had nothing:
>> Indexed phrase: "thanks for smiles"
>> Old index content: "thanks smiles"
>> New index content: "thanks null_1 smiles"
>>
>> While I assume this is position related goodness, it plays havoc with
>long
>> established phrase queries which now fail.
>> From researching I understand that I can selectively enable/disable
>> position increments at query time, is that correct?
>> Is there anything else to consider?
>> I assume that if I want to revert the format with the new indexer I
>would
>> need to switch positions off?
>>
>> Thanks
>>
>> Chris

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message