lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Difference in behaviour between LowerCaseFilter and String.toLowerCase()
Date Fri, 30 Nov 2012 03:30:34 GMT
Hi all.

trying to figure out what I was doing wrong in some of my own code so
I looked to LowerCaseFilter since I thought I remembered it doing this
correctly, and lo and behold, it failed the same test I had written.

Is this a bug or an intentional difference in behaviour?

    @Test
    public void testConsistencyWithStringClass() {
        // "Wikipedia" in Turkish, in uppercase.
        String str = "V\u0130K\u0130PED\u0130";
        TokenStream stream = new LowerCaseFilter(Version.LUCENE_36,
            new WhitespaceTokenizer(Version.LUCENE_36, new StringReader(str)));
        assertTrue(stream.incrementToken());
        assertEquals(str.toLowerCase(),
stream.getAttribute(CharTermAttribute.class).toString());
    }

This test fails on the assertEquals() because the actual string which
comes back lacks some of the combining marks.

The reason is that LowerCaseFilter is using Character.toLowerCase(),
which is exactly the method causing the bug I'm experiencing in my own
code, because equalsIgnoreCase() is using it and it's giving
questionable results.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message