lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Casey Durfee" <>
Subject Re: DisMax request handler doesn't work with stopwords?
Date Thu, 07 Jun 2007 21:59:33 GMT
Thank you!  That makes sense.

>>> Mike Klaas <> 6/7/2007 2:35 PM >>>
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

> It appears that if your search terms include stopwords and you use  
> the DisMax request handler, you get no results whereas the same  
> search with the standard request handler does give you results.  Is  
> this a bug or by design?

There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.
For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

    (rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
  (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01
while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

    (rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (rawText:python | url:python | inlinks:python^1.4 |  
  (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message