Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
Message-ID: <8220B8D6B56A9A4E8480489F38E42529E17CDD@rw-msg-02.broadvision.com>
From: "Zhang, Lisheng" <Lisheng.Zhang@broadvision.com>
To: "'java-user@lucene.apache.org'" <java-user@lucene.apache.org>
Subject: RE: Search with accents
Date: Tue, 1 Aug 2006 16:40:01 -0700 
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi,

In this case I guess we may need to find out what=20
exactly BrazilianAnalyzer do on the input string:

BrazilianAnalyzer braAnalyser =3D new BrazilianAnalyzer();
TokenStream ts1 =3D braAnalyzer.tokenStream("text", new =
StringReader(queryStr));
... // what BrazilianAnalyzer do?

Also what exactly ISOLatin1AccentFilter can do:

WhiteSpaceAnalyzer wsAnalyzer =3D new wsAnalyzer();
TokenStream tmpts =3D wsAnalyzer.tokenStream("text", new =
StringReader(queryStr));
TokenStream ts2 =3D new ISOLatin1AccentFilter(tmpts);
.... // what ISOLatin1AccentFilter do?

to see what is wrong with ts1 and see if ts2 can=20
do better job? I have never used ISOLatin1AccentFilter
before, I am not sure if the way to test it is really
OK, here I merely suggest a way to test.

Best regards, Lisheng
=20

-----Original Message-----
From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com]
Sent: Tuesday, August 01, 2006 2:34 PM
To: java-user@lucene.apache.org
Subject: Re: Search with accents


Yes...here's how I create my QueryParser:

QueryParser parser =3D new QueryParser("text", new =
BrazilianAnalyzer());

2006/8/1, Zhang, Lisheng <Lisheng.Zhang@broadvision.com>:
> Hi,
>
> Have you used the same BrazilianAnalyzer when
> searching?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com]
> Sent: Tuesday, August 01, 2006 1:40 PM
> To: java-user@lucene.apache.org
> Subject: Search with accents
>
>
> Hello there,
>
> I have a brazilian portuguese index, which has been analyzed with
> BrazilianAnalyzer. When searching words with accents, however, =
they're
> not found -- for instance, if the index contains some text with the
> word "ma=C3=A7=C3=A3" and I search for that very word, I get no hits, =
but if I
> search "maca" (which is another portuguese word) then the document
> containing "ma=C3=A7=C3=A3" is found.
>
> I've seen posts in the archive indicating that I should use
> ISOLatin1AccentFilter to handle this, but I don't quite see how:
> should I leave indexation as it is and use this filter only for =
search
> queries or should I apply it in both cases?
>
> Thank you,
> Eduardo Cordeiro
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org