lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rik Tamm-Daniels <...@attivio.com>
Subject Re: ICUTokenizer acting very strangely with oriental characters
Date Tue, 12 Aug 2014 23:57:07 GMT
mmn

jnbbbjb)nkkkk9nooooooon

Sent from my HTC

----- Reply message -----
From: "Shawn Heisey" <solr@elyograg.org>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Subject: ICUTokenizer acting very strangely with oriental characters
Date: Tue, Aug 12, 2014 19:00

See the original message on this thread for full details.  Some
additional information:

This happens on version 4.6.1, 4.7.2, and 4.9.0.  Here is a screenshot
showing the analysis problem in more detail.  The first line you can see
is the ICUTokenizer.

https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png

The original field value was:

20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導
者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury

Thanks,
Shawn


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message