lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Dubious stuff spotted in LowerCaseFilter
Date Thu, 22 Oct 2015 09:53:02 GMT
On Thu, Oct 22, 2015 at 7:05 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
>> Setting aside the fact that Character.toLowerCase is already dubious in some locales
(e.g. Turkish),
>
> This is not true. Character.toLowerCase() works locale-independent.
> It is only String.toLowerCase that works using default locale.

Yet if you have a field like "title" and the user and system are
Turkish, the user would expect their locale to apply, yet
LowerCaseFilter will not handle that. So whereas it is "safe" for
English hard-coded strings, it isn't safe for all fields you might
index in general.

Dawid's response shows, though, that at least for the time being,
there is nothing to worry about. Hopefully Unicode will never add a
code point which lowercases to one with less code units (or I guess
changes one of the lower ones to lowercase to more than one...)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message