lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue
Date Thu, 02 Feb 2012 00:05:53 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198347#comment-13198347
] 

Jan Høydahl commented on SOLR-3085:
-----------------------------------

You're right that technically it's not marked as required, but in the context of this "feature"
we're discussing, the reason why people get 0 hits is that mm=100%, counted from all (SHOULD)
clauses. And that means effectively that alltags:the is required.

What James suggested, and what most people tricked by this "feature" expects, is that if "the"
is a stopword for one of the qf fields, it becomes optional in some way.

So how can we get that end result? First we need a way to safely detect that we're in this
scenario, perhaps by inspecting whether each DisMax clause contains a field query for every
field listed in QF. If one or more is missing, we can assume that the query term is a stopword
in one or more of the fields. Then, one way may be to subtract the MM count accordingly, so
that in our case above, when we detect that the DisMax clause for "the" does not contain "title_en",
we do mm=mm-1 which will give us an MM of 1 instead of 2 and we'll get hits. This is probably
the easiest solution.

Another way would be to keep mm as is, and move the affected clause out of the BooleanQuery
and add it as a BoostQuery instead?

This behavior should be parameter driven, e.g. {{&mm.sw=false}} reading "Minimum should
match does not require Stop Words"
                
> Fix the dismax/edismax stopwords mm issue
> -----------------------------------------
>
>                 Key: SOLR-3085
>                 URL: https://issues.apache.org/jira/browse/SOLR-3085
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>            Reporter: Jan Høydahl
>              Labels: MinimumShouldMatch, dismax, stopwords
>             Fix For: 3.6, 4.0
>
>
> As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here http://search-lucene.com/m/Yne042qEyCq1
and here http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if not
all fields used in QF have exactly same stopword lists.
> Typical solution is to not use stopwords or harmonize stopword lists across all fields
in your QF, or relax the MM to a lower percentag. Sometimes these are not acceptable workarounds,
and we should find a better solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message