lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra <chithu.r...@gmail.com>
Subject Re: Accent insensitive search for greek characters
Date Tue, 24 Oct 2017 12:16:58 GMT
Hi,
                   ICUTransformFilter is working fine for greek characters
alone as per requirement. but one case it's breaking( σ & ς are the lower
forms of Σ Sigma).

*Example:*

I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed
as
πελατης).I get the expected search results if I perform the search for
πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case Greek
characters. But if I search as πελατησ I won't get any search results.

In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
ICUFoldingFilter.


Is ICU Transliterator rule formed right? Kindly look at the below code


TokenStream tok = new ICUTransformFilter(tok,
Transliterator.getInstance("Greek;
> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));



Kindly help me to resolve this.


Regards,
Chitra

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message