lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()
Date Fri, 30 Nov 2012 09:22:33 GMT
Sounds like a side effect of possibly different, locale-dependent,
results of using String.toLowerCase() and/or Character.toLowerCase().

http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase()
specifically mentions Turkish.

A Google search for "Character.toLowerCase() turkish" gets hits which
sound relevant.


--
Ian.


On Fri, Nov 30, 2012 at 3:30 AM, Trejkaz <trejkaz@trypticon.org> wrote:
> Hi all.
>
> trying to figure out what I was doing wrong in some of my own code so
> I looked to LowerCaseFilter since I thought I remembered it doing this
> correctly, and lo and behold, it failed the same test I had written.
>
> Is this a bug or an intentional difference in behaviour?
>
>     @Test
>     public void testConsistencyWithStringClass() {
>         // "Wikipedia" in Turkish, in uppercase.
>         String str = "V\u0130K\u0130PED\u0130";
>         TokenStream stream = new LowerCaseFilter(Version.LUCENE_36,
>             new WhitespaceTokenizer(Version.LUCENE_36, new StringReader(str)));
>         assertTrue(stream.incrementToken());
>         assertEquals(str.toLowerCase(),
> stream.getAttribute(CharTermAttribute.class).toString());
>     }
>
> This test fails on the assertEquals() because the actual string which
> comes back lacks some of the combining marks.
>
> The reason is that LowerCaseFilter is using Character.toLowerCase(),
> which is exactly the method causing the bug I'm experiencing in my own
> code, because equalsIgnoreCase() is using it and it's giving
> questionable results.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message