lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Dubious stuff spotted in LowerCaseFilter
Date Thu, 22 Oct 2015 08:05:50 GMT
Hi,

> Setting aside the fact that Character.toLowerCase is already dubious in some locales
(e.g. Turkish),

This is not true. Character.toLowerCase() works locale-independent. It is only String.toLowerCase
that works using default locale.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Trejkaz [mailto:trejkaz@trypticon.org]
> Sent: Thursday, October 22, 2015 7:15 AM
> To: Lucene Users Mailing List
> Subject: Dubious stuff spotted in LowerCaseFilter
> 
> Hi all.
> 
> LowerCaseFilter uses CharacterUtils.toLowerCase to perform its work.
> The latter method looks like this:
> 
> public final void toLowerCase(final char[] buffer, final int offset, final int limit)
> {
>   assert buffer.length >= limit;
>   assert offset <=0 && offset <= buffer.length;
>   for (int i = offset; i < limit;) {
>     i += Character.toChars(
>             Character.toLowerCase(
>                 codePointAt(buffer, i, limit)), buffer, i);
>    }
> }
> 
> Setting aside the fact that Character.toLowerCase is already dubious in some
> locales (e.g. Turkish), I notice that this is using the same "i" index counter to
> refer to both the source offset and the destination offset. So basically, this
> code has an undocumented assumption that Character.toLowerCase always
> returns a code point which takes up the same number of characters as the
> original one.
> 
> Whereas I do suppose that this might be the case, did someone actually
> verify it? Say, by iterating all code points or something? How confident are
> we that this will continue to be the case forever? :)
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message