lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick D <>
Subject Min-should-match and Mutli-word synonyms unexpected result
Date Mon, 05 Feb 2018 23:58:04 GMT
I have run into an issue with multi-word synonyms and a min-should-match
(MM) of anything other than `0`, *Solr version 6.6.0*.

Here is my example query, first with mm set to zero and the second with a
non-zero value:

With MM set to 0

which parse to:

+ngs_field_description:interface +ngs_field_description:builder)
ngs_field_description:eib) | ((+ngs_title:enterprise
+ngs_title:interface +ngs_title:builder) ngs_title:eib))~0.01"

and using my default MM (2<-35%)

which parse to

((((+ngs_field_description:enterprise +ngs_field_description:interface
+ngs_field_description:builder) ngs_field_description:eib)~2) |
(((+ngs_title:enterprise +ngs_title:interface +ngs_title:builder)

My synonym here is:
EIB, Enterprise Interface Builder

For my two documents I have the field ngs_title with values "EIB" (Doc 1)
and "enterprise interface builder" (Doc 2)

For both queries the doc 1 is always returned as EIB is matched, but for
doc 2 although I have EIB and Enterprise interface builder defined as
equivalent synonyms when the MM is not set to zero that document is not
returned. From the parsestring I see the ~2 being applied for the MM but my
expectation was that it has been met via the synonyms and the fact that I
am not actaully searching a phrase.

I couldn't find much on the relationship between the two outside of a some
of the things Doug Turnbull had linked to another solr-user question and
this blog post that mentions weirdness around MM and multi-word:

Also looked through the comments here,, but at first glance didn't
see anything that jumped out at me.

Here is the field definition for the ngs_* fields:

<fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory"
        <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="([()])" replacement=""/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
pattern="(^[^0-9A-Za-z_]+)|([^0-9A-Za-z_]+$)" replacement=""/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>

I am not sure if we cannot use MM anymore for these type of queries or if
there is something I setup incorrectly, any help would be greatly


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message