lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiffany Goguen <>
Subject Solr 5.5 Issue with CJK and mm being ignored when searching with white space.
Date Wed, 23 Mar 2016 14:21:46 GMT

Is this a new bug?

I am using esdimax and I have set mm=100% via <solrQueryParser defaultOperator="AND"/>

My search terms are クイックリファレンス
Term 1 - クイック
Term 2- リファレンス

If I search forクイックリファレンス (no spaces) I get no results.  This expected.

If I search for クイック リファレンス (space between ク リ) I get 1 result.  This
is bad.  I am expecting mm=100% to still apply.

If I search for クイックOR リファレンス I get 1 result.  This expected.  The OR
is overriding the mm=100%.

If I search for クイック AND リファレンス I get 1 result.  This is bad.  I am expecting
mm=100% to still apply.

I have seen some open JIRA tickets, but I was not sure if they applied or if this is a new

In CJK searches spaces should not matter.  In the Analysis tool I can see the correct tokens
being generated.  The parser is doing different things based on space or no space.

With space (not expected result):

When the query is space delimited to two terms, I see each term analyzed separately, per the
following debugQuery output:
クイック is treated in one section:

title_ja:クイック^1.2 | primary_header_ja:クイック^1.2 | file_name:クイック^1.2
| meta_description_ja:クイック^0.5 | secondary_header_ja:クイック^0.5 | body_ja:クイック^0.5
| inlink_text_ja:クイック^1.2)~0.17

リファレンス is treated in one section:

title_ja:リファレンス^1.2 | primary_header_ja:リファレンス^1.2 | file_name:リファレンス^1.2
| meta_description_ja:リファレンス^0.5 | secondary_header_ja:リファレンス^0.5
| body_ja:リファレンス^0.5 | inlink_text_ja:リファレンス^1.2)~0.17

Without space (expected result):

When the query is one term I see that Solr analyzes it once and Japanese tokenizer does tokenize
it to two terms:
(title_ja:クイック title_ja:リファレンス)

Given that クイック and リファレンス do not appear together in any of the fields
listed in the query filter,
body_en^0.5 title_en^1.2 url_path^1.2 file_name^1.2 primary_header_en^1.2 secondary_header_en^0.5
meta_description_en^0.5 inlink_text_en^1.2 body_ja^0.5 title_ja^1.2 primary_header_ja^1.2
secondary_header_ja^0.5 meta_description_ja^0.5 inlink_text_ja^1.2

and I have specified that the default query operator is AND,
  <solrQueryParser defaultOperator="AND"/>

nothing will be matched. i.e. (title_ja:クイック title_ja:リファレンス)

Tiffany Goguen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message