lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()
Date Sat, 01 Dec 2012 09:02:51 GMT
Iterating character-by-character is different than considering the
entire string at once so your observation is correct, that's how it's
supposed to work. In particular, note this in String#toLowerCase
documentation:

"Since case mappings are not always 1:1 char mappings, the resulting
String may be a different length than the original String."

So it simply cannot be the same as iterating char-by-char.

Dawid

On Sat, Dec 1, 2012 at 6:32 AM, Trejkaz <trejkaz@trypticon.org> wrote:
> On Fri, Nov 30, 2012 at 8:22 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> Sounds like a side effect of possibly different, locale-dependent,
>> results of using String.toLowerCase() and/or Character.toLowerCase().
>>
>> http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase()
>> specifically mentions Turkish.
>>
>> A Google search for "Character.toLowerCase() turkish" gets hits which
>> sound relevant.
>
> Certainly Turkish has special rules because of that uppercase I with
> dot. I was more wondering whether LowerCaseFilter was intentionally
> doing it differently to String.toLowerCase() or whether it was some
> kind of unintentional side-effect of using Character.toLowerCase()
> iteratively.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message