lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Cunha de Almeida <almeida...@gmail.com>
Subject Re: [PATCH] Bug on BrazilianAnalyzer
Date Fri, 21 Nov 2008 22:15:40 GMT
On Fri, 21 Nov 2008 16:46:30 -0200
Rafael Cunha de Almeida <almeidaraf@gmail.com> wrote:

> On Mon, 17 Nov 2008 19:58:47 -0800
> "Adriano Crestani" <adrianocrestani@apache.org> wrote:
> 
> > Hi Rafael,
> > 
> > What is your scenario?
> > 
> > Maybe it was defined this way so it do not filter uppercased stop words.
> > Like, for example, the downcased word "se" is a stopword, but the uppercased
> > "SE" stands for "Sergipe", a brazilian state, so it should not be filtered.
> 
> Suppose you are right, but passing it through the LowerCaseFilter can
> be useful too, specially if you don't care much about those corner
> cases (the GermanAnalyzer, for instance, passes through
> LowerCaseFilter first). The class being final doesn't allow to inherit
> from it and make the changes if one needs to, which is unfortunate :-(.
> 
> I would like to see a change in this whole stemmer's and language
> analyzer's API in order to make it more flexible and extensible. The
> way it is you have to use them in that predeterminaded way.
> 
> It would be nice if there was only one StemFilter, a Stemmer interface
> and all Stemmers were subclasses of that. Then, the StemFilter should
> get its Stemmer as a constructor parameter. I see no reason for
> BrazilianAnalyzer to be public.

To be final, sorry. I was a bit tired when I wrote all that.

> Are you interested in those kind of changes? Do you agree with them?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message