lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Accent insensitive search for greek characters
Date Tue, 24 Oct 2017 13:21:23 GMT
Your greek transform stuff does not work because you use "Lower"
instead of casefolding.

If ICUFoldingFilter works for what you want, but you want to restrict
it to greek, then just restrict it to the greek region. See
FilteredNormalizer2 and UnicodeSet documentation. And look at how
ICUFoldingFilter is implemented in source code so you understand how
to instantiate an equivalent ICUNormalizer2Filter just with the greek
restriction.

On Tue, Oct 24, 2017 at 8:16 AM, Chitra <chithu.r111@gmail.com> wrote:
> Hi,
>                    ICUTransformFilter is working fine for greek characters
> alone as per requirement. but one case it's breaking( σ & ς are the lower
> forms of Σ Sigma).
>
> *Example:*
>
> I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed
as
> πελατης).I get the expected search results if I perform the search for
> πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case
Greek
> characters. But if I search as πελατησ I won't get any search results.
>
> In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
> ICUFoldingFilter.
>
>
> Is ICU Transliterator rule formed right? Kindly look at the below code
>
>
> TokenStream tok = new ICUTransformFilter(tok,
> Transliterator.getInstance("Greek;
>> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));
>
>
>
> Kindly help me to resolve this.
>
>
> Regards,
> Chitra

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message