lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Newburn <jnewb...@zappos.com>
Subject Dismax Minimum Match/Stopwords Bug
Date Thu, 11 Dec 2008 17:34:46 GMT
I have discovered some weirdness with our Minimum Match functionality.
Essentially it comes up with absolutely no results on certain queries.
Basically, searches with 2 words and 1 being ³the² don¹t have a return
result.  From what we can gather the minimum match criteria is making it
such that if there are 2 words then both are required.  Unfortunately, the
stopwords are pulled resulting in ³the² being removed and then solr is
requiring 2 words when only 1 exists to match on.  Is there a way around
this?  I really need it to either require only non-stopwords or not filter
out stopwords.  We know stopwords are causing the issue because taking out
the stopwords fixes the problem.  Also, we can change mm setting to 75% and
fix the problem.

Example:
Brand: The North Face
Search: the north (returns no results)

Our config is basically:
MM: str name="mm">2&lt;-1</str>
FieldType: 
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
               <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

               <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
               <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message