lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
Date Thu, 01 Nov 2012 00:37:13 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-3589:
------------------------------

    Attachment: SOLR-3589.patch

here's my hack patch.

no idea if its right... in particular i dont really know what is going on with this query
parser in general.

the high level idea is that we can detect if the "multiple tokens" are synonyms versus CJK
by the coord setting of the BQ, since coord will be enabled in the e.g. CJK case (same as
if it were whitespace), but disabled for synonyms.

but it could be totally wrong: at least tests pass and it might give someone else some ideas
or be a useful workaround :) 
                
> Edismax parser does not honor mm parameter if analyzer splits a token
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3589
>                 URL: https://issues.apache.org/jira/browse/SOLR-3589
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 3.6, 4.0-BETA
>            Reporter: Tom Burton-West
>         Attachments: SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, testSolr3589.xml.gz
>
>
> With edismax mm set to 100%  if one of the tokens is split into two tokens by the analyzer
chain (i.e. "fire-fly"  => fire fly), the mm parameter is ignored and the equivalent of
 OR query for "fire OR fly" is produced.
> This is particularly a problem for languages that do not use white space to separate
words such as Chinese or Japenese.
> See these messages for more discussion:
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
> http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message