From java-user-return-43852-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Tue Dec 01 18:44:04 2009 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 66732 invoked from network); 1 Dec 2009 18:44:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Dec 2009 18:44:04 -0000 Received: (qmail 43637 invoked by uid 500); 1 Dec 2009 18:44:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43602 invoked by uid 500); 1 Dec 2009 18:44:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43592 invoked by uid 99); 1 Dec 2009 18:44:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2009 18:44:01 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.222.176 as permitted sender) Received: from [209.85.222.176] (HELO mail-pz0-f176.google.com) (209.85.222.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2009 18:43:59 +0000 Received: by pzk6 with SMTP id 6so3735480pzk.29 for ; Tue, 01 Dec 2009 10:43:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=vGWRMv5RnClBWnOSFLn9uzfUUcrBGZCBPv8kwsGG6Ak=; b=uS9uC0we02P5oabYPe9JDgM5lM3rrDsQlwcsPciZ8k1mnurYfhEShb3lKTZhQJ0Z0e i+1rtKlN6+tNmOWpF3mKIPaRTBapx2Iwpvaamj5ZHzB4ElI9HrET/hliI15N6PS7mRs4 IJmBC91MPZixyyOaG8PkL/exjlbWRC1TS/w3A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=hNbnMskWMf7lqOcgjCS81jUTx8IJ29jA0TzXZA1xHWvP5QYbX2yVzBNFgplSYl1+vJ 1gq/M6q8u94YZ+LOH9gXaRCFhGeRC6CHXN6EmMwuX2dkLsgplimpR0WUzq++c2477Xal TfJ8Mfkhp6M99hyIeDNFybVnofWuYX+v7VktI= MIME-Version: 1.0 Received: by 10.115.115.14 with SMTP id s14mr11611337wam.189.1259693018740; Tue, 01 Dec 2009 10:43:38 -0800 (PST) In-Reply-To: <655947.17150.qm@web52906.mail.re2.yahoo.com> References: <655947.17150.qm@web52906.mail.re2.yahoo.com> From: Robert Muir Date: Tue, 1 Dec 2009 13:43:18 -0500 Message-ID: <8f0ad1f30912011043k4afe442eue41a09b808ac66b7@mail.gmail.com> Subject: Re: LowerCaseFilter fails one letter (I) of Turkish alphabet To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e648f5dee878d50479af2249 --0016e648f5dee878d50479af2249 Content-Type: text/plain; charset=UTF-8 Hi Ahmet, After thinking about what Shai brought up, I changed my mind and think it is not good enough that we only have Collation as a way to solve this. Because you might want turkish stemming too, and right now there is no way for the included snowball turkish stemmer to work. I really do not like this. So as much as I want to reduce clutter and not have lots of filters that can be solved in a general way with unicode, I think this is one case where the best solution for now would be to have a turkish-specific lowercasefilter... I don't think we have to use String for this either, we can just apply rules to the two uppercase I's, and lowercase everything else. Will you open an issue? On Mon, Nov 30, 2009 at 2:00 PM, AHMET ARSLAN wrote: > In Turkish alphabet lowercase of I is not i. It is LATIN SMALL LETTER > DOTLESS I. LowerCaseFilter which uses Character.toLowerCase() makes mistake > just for that character. > > http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase() > > I am not sure if it is worth to add a new TokenFilter for Turkish language. > I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It would > be nice to see TurkishLowerCaseFilter in Lucene. > > Wiki recommends to ask permission from lucene committers before opening an > issue. I can provide a patch (although it is just a one line change in > original LowercaseFilter) for that if you want. > > Thank you for your consideration. > > Ahmet > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Robert Muir rcmuir@gmail.com --0016e648f5dee878d50479af2249--