lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Position increment in WordDelimiterFilter.
Date Mon, 18 Jan 2016 14:27:44 GMT
On 1/18/2016 6:21 AM, Modassar Ather wrote:
> Can you please send us tokens you get (and positions) when you analyze
> *WiFi device*
> Tokens generated and their respective positions.
> WiFi                1
> Wi                  1
> WiFi                1
> Fi                  2
> device              3

It seems very odd to me that the original value would show up twice with
the preserveOriginal parameter set, but I am seeing the same behavior on
4.7 and 5.3.  Because both copies are at the same position, this will
not affect search, but will slightly affect relevance if you are not
specifying a sort parameter.  Everything else about the analysis looks
correct to me, and the positions you see are needed for a phrase query
to work correctly.

I have seen working configurations where preserveOriginal is set on the
index analysis but NOT set on query analysis.  This is how my own schema
is configured.  One of the reasons for this configuration is to reduce
the number of terms in the query so it is faster than it would be if
preserveOriginal were present and generated additional terms.  The
preserveOriginal on the index side ensures a match whether mixed case is
used or not.


View raw message