lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject ICUFoldingFilter
Date Mon, 04 Jun 2018 13:07:15 GMT
Hi, I'm using ICUFoldingFilter and for the most part it does exactly what I
want. However there are some behaviors I'd like to tweak. For example it
maps "aaa^bbb" to "aaabbb". I am trying to understand why it does that, and
whether there is any way to prevent it.

I spent a little time with
http://www.unicode.org/reports/tr30/tr30-4.html#UnicodeData which I guess
is the basis for what this filter does (it's referenced in the javadocs),
but that didn't answer my questions. As an aside, it seems this tech report
was withdfrawn by the unicode consortium? Not sure what that means if
anything, but it seems ominous.

Anyway, I would appreciate pointers to more info, and specifically, whether
there are any alternatives to the utr30.nrm data file, or any possibility
to select among the many transformations this filter applies.

Thanks!

Mike S

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message