lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Dubious stuff spotted in LowerCaseFilter
Date Thu, 22 Oct 2015 10:00:46 GMT
> LowerCaseFilter will not handle that. So whereas it is "safe" for
> English hard-coded strings, it isn't safe for all fields you might
> index in general.

This filter is a "safe" fallback that works identically regardless of
the locale you
have on your computer (or on the server). This, I believe, is good and
avoids nasty surprises of locale-sensitive environment. Contrary to
the intuition, locale-sensitive methods are more often a headache and
source of problems than whatever value they provide.

If you live in Turkey then I think you should be using the dedicated
TurkishLowerCaseFilter which handles Turkish letter conversion better.

> Hopefully Unicode will never add a code point which lowercases to one with less code
units (or I guess
> changes one of the lower ones to lowercase to more than one...)

I agree this is an assumption that will hold... but if you care to provide a
patch then a simple test case like the one I provided would be (I
believe) sufficient to ensure this situation is captured early on
during automated testing.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message