lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3636) edismax, synonyms and mm=100%
Date Thu, 16 Aug 2012 00:09:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435631#comment-13435631
] 

Jack Krupansky commented on SOLR-3636:
--------------------------------------

I checked the code, both in 4x and 3.6, and it in fact uses the actual number of optional
of terms generated in the top-level BooleanQuery, not the "number of terms found by edismax
from the original query" as stated here in the issue description.

I suspect that this is a variation of SOLR-3589: "Edismax parser does not honor mm parameter
if analyzer splits a token", where the low-level Lucene query parser code passes a single
term to the field analyzer and gets multiple terms back and autoGeneratePhraseQueries is false,
so the terms are ORed because that is the default operator. The Lucene level code does not
know about "mm", Solr, or any request parameters at all.

Hmmm... maybe the solution to some of these issues is that if mm is 100%, Solr should implicitly
set the default query operator directly to "AND", which would give the Lucene code the information
needed to generate an AND rather than an OR.


                
> edismax, synonyms and mm=100%
> -----------------------------
>
>                 Key: SOLR-3636
>                 URL: https://issues.apache.org/jira/browse/SOLR-3636
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>            Reporter: Lance Norskog
>            Priority: Minor
>             Fix For: 4.0
>
>
> There is a problem with query-side synonyms, edismax and must-match=100%. 
> edismax interprets must-match=100% as "number of terms found by edismax from the original
query". These terms go through the query analyzer, and the synonym filter creates more terms,
*but* the must-match term count is not incremented. Thus, given a synonym of
> {code}
> monkeyhouse => monkey house
> {code}
> the query {{q=big+monkeyhouse&mm=100%}} becomes (effectively) {{q=big+monkey+house&mm=2}}.
This query finds documents matching only two out of three terms {{big+monkey, monkey+house,
big+house}}.
> This might also be a problem in dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message