lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AHMET ARSLAN <iori...@yahoo.com>
Subject LowerCaseFilter fails one letter (I) of Turkish alphabet
Date Mon, 30 Nov 2009 19:00:12 GMT
In Turkish alphabet lowercase of I is not i. It is LATIN SMALL LETTER DOTLESS I. LowerCaseFilter
which uses Character.toLowerCase() makes mistake just for that character. 

http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()

I am not sure if it is worth to add a new TokenFilter for Turkish language. I see there exist
GreekLowerCaseFilter and RussianLowerCaseFilter. It would be nice to see TurkishLowerCaseFilter
in Lucene.

Wiki recommends to ask permission from lucene committers before opening an issue. I can provide
a patch (although it is just a one line change in original LowercaseFilter) for that if you
want. 

Thank you for your consideration.

Ahmet



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message