lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Dubious stuff spotted in LowerCaseFilter
Date Thu, 22 Oct 2015 10:00:46 GMT
> LowerCaseFilter will not handle that. So whereas it is "safe" for
> English hard-coded strings, it isn't safe for all fields you might
> index in general.

This filter is a "safe" fallback that works identically regardless of
the locale you
have on your computer (or on the server). This, I believe, is good and
avoids nasty surprises of locale-sensitive environment. Contrary to
the intuition, locale-sensitive methods are more often a headache and
source of problems than whatever value they provide.

If you live in Turkey then I think you should be using the dedicated
TurkishLowerCaseFilter which handles Turkish letter conversion better.

> Hopefully Unicode will never add a code point which lowercases to one with less code
units (or I guess
> changes one of the lower ones to lowercase to more than one...)

I agree this is an assumption that will hold... but if you care to provide a
patch then a simple test case like the one I provided would be (I
believe) sufficient to ensure this situation is captured early on
during automated testing.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message