lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
Date Wed, 15 Aug 2012 21:56:37 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435558#comment-13435558
] 

Jack Krupansky commented on SOLR-3589:
--------------------------------------

The root problem is that with automatic phrase query generation turned off, by default and
for the text_general field in particular, the core Lucene query parser is generating a query
for the tuple of sub-terms using the default query operator, which is "OR" by default. There
is no notion of an "mm" or min-match parameter down at that level in Lucene, which knows nothing
about Solr or edismax or request parameters.

As things stand, the only option is to set the default query operator, "q.op", to "AND".

You can of course also turn on autoGeneratePhraseQueries or select an analyzer than doesn't
split terms.

At this point, I would advise resolving this issue as "Won't Fix", although it could also
be spun off into a Lucene issue to add support for min-match down at that level, which edismax
can then also communicate with.


                
> Edismax parser does not honor mm parameter if analyzer splits a token
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3589
>                 URL: https://issues.apache.org/jira/browse/SOLR-3589
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 3.6
>            Reporter: Tom Burton-West
>
> With edismax mm set to 100%  if one of the tokens is split into two tokens by the analyzer
chain (i.e. "fire-fly"  => fire fly), the mm parameter is ignored and the equivalent of
 OR query for "fire OR fly" is produced.
> This is particularly a problem for languages that do not use white space to separate
words such as Chinese or Japenese.
> See these messages for more discussion:
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
> http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message