lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Extending org.apache.lucene.analysis.br.BrazilianAnalyzer to discard numeric tokens
Date Mon, 07 Feb 2011 16:44:31 GMT
On Sun, Feb 6, 2011 at 3:28 PM, Georger Araujo <georger.araujo@gmail.com> wrote:
> Hi,
> I started using Lucene a few weeks ago, and I must say I'm amazed. Hats off
> to the developers and the community!
> I'd like to write a custom analyzer whose only difference to
> org.apache.lucene.analysis.br.BrazilianAnalyzer is that I want it to discard
> numeric tokens from the input. I've looked at the code and also at the
> discussion in [1], but I'm lost about what is the simplest/cleanest way to
> go.
> What do you think?

Hi, in general the supplied analyzers are basically very general
purpose examples.

So i would make your own analyzer: except using a tokenizer that
discards numbers (like lowercasetokenizer) instead of
standardtokenizer: something like LowerCaseTokenizer +
BrazilianStemFilter + Brazilian stopwords in a stopfilter.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message