lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <>
Subject Highlighting content field problem when using JiebaTokenizerFactory
Date Tue, 13 Oct 2015 09:04:29 GMT

I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
Solr. It works fine with the segmentation when I'm using
the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not
highlighting in the correct place. For example, when I search of 自然环境与企业本身,
it highlight 认<em>为自然环</em><em>境</em><em>与企</em><em>业本</em>身的

Even when I search for English character like  responsibility, it highlight
 <em> *responsibilit<em>*y.

Basically, the highlighting goes off by 1 character/space consistently.

This problem only happens in content field, and not in any other fields.
Does anyone knows what could be causing the issue?

I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message