lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "alessandro.benedetti" <>
Subject Phrase Queries and Punctuation
Date Wed, 01 Feb 2017 15:58:59 GMT
Hi all,
I was just thinking about Phrase Queries and punctuation ( and in general
how to manage increment positions when such a sentence delimiter happens).

At the moment for multi valued fields we have the "increment position gap"
which allow to avoid phrase queries to span different values for the same

In a single valued textual fields, we may have hundreds of different
sentences ( separated by punctuation).
Generally we don't want phrase queries to span different sentences so I
would expect a similar position increment behaviour.

A possible solution could be to have a tokenizer which is able to split
sentences ( a lot of approaches in NLP are already there to be used) and add
an incrementPositionGap between sentences as well ( < multi value increment
position gap).
A very naive solution would be to add the position increment whenever we
find a punctuation delimiter ( such in the standard tokenizer happens for
stop words.
I have not analysed the implementations in details yet,
At this stage I was just wondering if anyone has faced this problem with
Lucene and Solr ?
Which kind of side effects could happen if we add the increment position gap
on a punctuation delimiter basis, by default on the Standard Tokenizer ?


Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. -
View this message in context:
Sent from the Solr - User mailing list archive at

View raw message