Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 8408 invoked from network); 1 Aug 2006 23:37:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Aug 2006 23:37:36 -0000 Received: (qmail 68258 invoked by uid 500); 1 Aug 2006 23:37:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68221 invoked by uid 500); 1 Aug 2006 23:37:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68209 invoked by uid 99); 1 Aug 2006 23:37:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Aug 2006 16:37:30 -0700 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=RCVD_IN_WHOIS_BOGONS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [64.18.1.187] (HELO exprod6og53.obsmtp.com) (64.18.1.187) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 01 Aug 2006 16:37:29 -0700 Received: from source ([139.56.4.13]) by exprod6ob53.postini.com ([64.18.5.12]) with SMTP; Tue, 01 Aug 2006 16:37:06 PDT Received: from triangulum2.broadvision.com (triangulum2.broadvision.com [10.10.102.101]) by nebula2.broadvision.com (8.10.2+Sun/8.10.2) with ESMTP id k71Nb5O09761 for ; Tue, 1 Aug 2006 16:37:05 -0700 (PDT) Received: from rw-gw-01.BroadVision.com (localhost [127.0.0.1]) by triangulum2.broadvision.com (8.9.3+Sun/8.9.3) with ESMTP id QAA02654 for ; Tue, 1 Aug 2006 16:39:43 -0700 (PDT) Received: by rw-gw-01.broadvision.com with Internet Mail Service (5.5.2655.55) id ; Tue, 1 Aug 2006 16:40:03 -0700 Message-ID: <8220B8D6B56A9A4E8480489F38E42529E17CDD@rw-msg-02.broadvision.com> From: "Zhang, Lisheng" To: "'java-user@lucene.apache.org'" Subject: RE: Search with accents Date: Tue, 1 Aug 2006 16:40:01 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2655.55) Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, In this case I guess we may need to find out what=20 exactly BrazilianAnalyzer do on the input string: BrazilianAnalyzer braAnalyser =3D new BrazilianAnalyzer(); TokenStream ts1 =3D braAnalyzer.tokenStream("text", new = StringReader(queryStr)); ... // what BrazilianAnalyzer do? Also what exactly ISOLatin1AccentFilter can do: WhiteSpaceAnalyzer wsAnalyzer =3D new wsAnalyzer(); TokenStream tmpts =3D wsAnalyzer.tokenStream("text", new = StringReader(queryStr)); TokenStream ts2 =3D new ISOLatin1AccentFilter(tmpts); .... // what ISOLatin1AccentFilter do? to see what is wrong with ts1 and see if ts2 can=20 do better job? I have never used ISOLatin1AccentFilter before, I am not sure if the way to test it is really OK, here I merely suggest a way to test. Best regards, Lisheng =20 -----Original Message----- From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com] Sent: Tuesday, August 01, 2006 2:34 PM To: java-user@lucene.apache.org Subject: Re: Search with accents Yes...here's how I create my QueryParser: QueryParser parser =3D new QueryParser("text", new = BrazilianAnalyzer()); 2006/8/1, Zhang, Lisheng : > Hi, > > Have you used the same BrazilianAnalyzer when > searching? > > Best regards, Lisheng > > -----Original Message----- > From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com] > Sent: Tuesday, August 01, 2006 1:40 PM > To: java-user@lucene.apache.org > Subject: Search with accents > > > Hello there, > > I have a brazilian portuguese index, which has been analyzed with > BrazilianAnalyzer. When searching words with accents, however, = they're > not found -- for instance, if the index contains some text with the > word "ma=C3=A7=C3=A3" and I search for that very word, I get no hits, = but if I > search "maca" (which is another portuguese word) then the document > containing "ma=C3=A7=C3=A3" is found. > > I've seen posts in the archive indicating that I should use > ISOLatin1AccentFilter to handle this, but I don't quite see how: > should I leave indexation as it is and use this filter only for = search > queries or should I apply it in both cases? > > Thank you, > Eduardo Cordeiro > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org