lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@broadvision.com>
Subject RE: Search with accents
Date Tue, 01 Aug 2006 23:40:01 GMT
Hi,

In this case I guess we may need to find out what 
exactly BrazilianAnalyzer do on the input string:

BrazilianAnalyzer braAnalyser = new BrazilianAnalyzer();
TokenStream ts1 = braAnalyzer.tokenStream("text", new StringReader(queryStr));
... // what BrazilianAnalyzer do?

Also what exactly ISOLatin1AccentFilter can do:

WhiteSpaceAnalyzer wsAnalyzer = new wsAnalyzer();
TokenStream tmpts = wsAnalyzer.tokenStream("text", new StringReader(queryStr));
TokenStream ts2 = new ISOLatin1AccentFilter(tmpts);
.... // what ISOLatin1AccentFilter do?

to see what is wrong with ts1 and see if ts2 can 
do better job? I have never used ISOLatin1AccentFilter
before, I am not sure if the way to test it is really
OK, here I merely suggest a way to test.

Best regards, Lisheng
 

-----Original Message-----
From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com]
Sent: Tuesday, August 01, 2006 2:34 PM
To: java-user@lucene.apache.org
Subject: Re: Search with accents


Yes...here's how I create my QueryParser:

QueryParser parser = new QueryParser("text", new BrazilianAnalyzer());

2006/8/1, Zhang, Lisheng <Lisheng.Zhang@broadvision.com>:
> Hi,
>
> Have you used the same BrazilianAnalyzer when
> searching?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Eduardo S. Cordeiro [mailto:escordeiro@gmail.com]
> Sent: Tuesday, August 01, 2006 1:40 PM
> To: java-user@lucene.apache.org
> Subject: Search with accents
>
>
> Hello there,
>
> I have a brazilian portuguese index, which has been analyzed with
> BrazilianAnalyzer. When searching words with accents, however, they're
> not found -- for instance, if the index contains some text with the
> word "maçã" and I search for that very word, I get no hits, but if I
> search "maca" (which is another portuguese word) then the document
> containing "maçã" is found.
>
> I've seen posts in the archive indicating that I should use
> ISOLatin1AccentFilter to handle this, but I don't quite see how:
> should I leave indexation as it is and use this filter only for search
> queries or should I apply it in both cases?
>
> Thank you,
> Eduardo Cordeiro
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message