Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89598 invoked from network); 30 Jul 2007 20:37:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Jul 2007 20:37:02 -0000 Received: (qmail 67785 invoked by uid 500); 30 Jul 2007 20:36:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67750 invoked by uid 500); 30 Jul 2007 20:36:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67739 invoked by uid 99); 30 Jul 2007 20:36:55 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jul 2007 13:36:55 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chris.lu@gmail.com designates 64.233.182.185 as permitted sender) Received: from [64.233.182.185] (HELO nf-out-0910.google.com) (64.233.182.185) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jul 2007 20:36:46 +0000 Received: by nf-out-0910.google.com with SMTP id d3so187298nfc for ; Mon, 30 Jul 2007 13:36:25 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=FjR4u8qZJMLpNycVUw5nKfopURLIb8P1mOhWjbXE1wS8hfaNucr9GlG1YsNdSZ4QfhWmCJKzVT2RQS2jk+aPVrO4lhqt/1D9znB/TSHykosAPQpzIsO39ardSh2potcqZbc8BaSTG1msJMUjnP3AiwoTn1mSuev4a4rJvCmXSvE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=CKN/bsFkqG7rtApZvQN7aU4qw6YSeKJIS8FlLKwaraXpsxJTySOL3YE216jjL8fmrILay4Cfl3py3mbnIpjCtEeBQls6XF/k/+szifukepgJdIYs02qMYEyGtN0zfzLa1JdrOscrx1LPK5uTdZHRViZH4KDodOruvBk9dEW10Q8= Received: by 10.78.162.4 with SMTP id k4mr1612598hue.1185827785264; Mon, 30 Jul 2007 13:36:25 -0700 (PDT) Received: by 10.78.140.12 with HTTP; Mon, 30 Jul 2007 13:36:25 -0700 (PDT) Message-ID: <6e3ae6310707301336w6fe01895w4dc5319783c3f666@mail.gmail.com> Date: Mon, 30 Jul 2007 13:36:25 -0700 From: "Chris Lu" To: java-user@lucene.apache.org Subject: Re: a question for french analyzer In-Reply-To: <359a92830707301118k2abb67a3we47ea024b249ac80@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <6e3ae6310707301106p175e85dmfb9d6724ac0f9851@mail.gmail.com> <359a92830707301118k2abb67a3we47ea024b249ac80@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi, Erick, I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir's tip, and it works great! And I think it's the right way to go. Problems like "You have to store the data raw for display purposes if you want the accents to show though" will go away since Analyzer already have the original text and analyzed token mechanism built-in. And it's pretty easy to do! However, is there any special case that you have? Not really knowing French, I only tested one word, "fen=EAtre", and it's analyzed into "fenetre". --=20 Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=3DCreate_Lucene_Database_Search_in_= 3_minutes On 7/30/07, Erick Erickson wrote: > Gosh, I sure hope not, because that would mean that we rolled our > own for no good reason. We wound up just collapsing > the input stream by substituting plain old 'e' for all the accented > variants before indexing and before searching. Be *really* careful > what character set you're using. > > Actually, we would have still had to roll our own because the > character mapping was...er...wonky .... > > You have to store the data raw for display purposes if you want the > accents to show though... > > Best > Erick > > On 7/30/07, Chris Lu wrote: > > > > Hi, > > > > I am not a French speaker, but here are some questions regarding > > French analyzer: > > > > Is there any analyzer that can do this? Analyze accentuated letters to > > non accentuated corresponding letters (=E9,=E8,=EA,=EB -> e), so that > > > > search "fen=EAtre" (=3Dwindow) found all docs with "fen=EAtre" or "fene= tre" > > and > > search "fenetre" found the same result, all docs with "fen=EAtre" or > > "fenetre" > > > > Current analyzers, Snowball-French and FrenchAnalyzer don't have this > > feature. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > > > > http://wiki.dbsight.com/index.php?title=3DCreate_Lucene_Database_Search= _in_3_minutes > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org