lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chu"<scott....@udngroup.com>
Subject Re: Highlighting content field problem when using JiebaTokenizerFactory
Date Tue, 20 Oct 2015 03:32:51 GMT
Hi Edwin,

I didn't use Jieba on Chinese (I use only CJK, very foundamental, I know) so I didn't experience
this problem. 

I'd suggest you post your schema.xml so we can see how you define your content field and the
field type it uses?

In the mean time, refer to these articles, maybe the answer or workaround can be deducted
from them.

https://issues.apache.org/jira/browse/SOLR-3390

http://qnalist.com/questions/661133/solr-is-highlighting-wrong-words

http://qnalist.com/questions/667066/highlighting-marks-wrong-words

Good luck!




Scott Chu,scott.chu@udngroup.com
2015/10/20 
----- Original Message ----- 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-13, 17:04:29
Subject: Highlighting content field problem when using JiebaTokenizerFactory


Hi,

I'm trying to use the JiebaTokenizerFactory to index Chinese characters in

Solr. It works fine with the segmentation when I'm using
the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not
highlighting in the correct place. For example, when I search of 自然環境与企業本身,
it highlight 認<em>為自然環</em><em>境</em><em>与企</em><em>業本</em>身的

Even when I search for English character like responsibility, it highlight
 <em> *responsibilit<em>*y.

Basically, the highlighting goes off by 1 character/space consistently.

This problem only happens in content field, and not in any other fields.
Does anyone knows what could be causing the issue?

I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.


Regards,
Edwin



-----
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6140 / 病毒庫: 4447/10808 - 發佈日期: 10/12/15
Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message