lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Best practices for Solr highlighter for CJK
Date Wed, 02 Jan 2013 18:51:44 GMT
Hello all,

What are the best practices for setting up the highlighter to work with CJK?
We are using the ICUTokenizer with the CJKBigramFilter, so overlapping
bigrams are what are actually being searched. However the highlighter seems
to only highlight the first of any two overlapping bigrams.   i.e.  ABC =>
searched as AB BC  only AB gets highlighted even if the matching string is
ABC. (Where ABC are chinese characters such as 大亚湾  => searched as 大亚 亚湾,
but only   大亚 is highlighted rather than 大亚湾)

Is there some highlighting parameter that might fix this?

Tom Burton-West

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message