lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Cunha de Almeida <almeida...@gmail.com>
Subject Re: [PATCH] Bug on BrazilianAnalyzer
Date Fri, 21 Nov 2008 18:46:30 GMT
On Mon, 17 Nov 2008 19:58:47 -0800
"Adriano Crestani" <adrianocrestani@apache.org> wrote:

> Hi Rafael,
> 
> What is your scenario?
> 
> Maybe it was defined this way so it do not filter uppercased stop words.
> Like, for example, the downcased word "se" is a stopword, but the uppercased
> "SE" stands for "Sergipe", a brazilian state, so it should not be filtered.

Suppose you are right, but passing it through the LowerCaseFilter can
be useful too, specially if you don't care much about those corner
cases (the GermanAnalyzer, for instance, passes through
LowerCaseFilter first). The class being final doesn't allow to inherit
from it and make the changes if one needs to, which is unfortunate :-(.

I would like to see a change in this whole stemmer's and language
analyzer's API in order to make it more flexible and extensible. The
way it is you have to use them in that predeterminaded way.

It would be nice if there was only one StemFilter, a Stemmer interface
and all Stemmers were subclasses of that. Then, the StemFilter should
get its Stemmer as a constructor parameter. I see no reason for
BrazilianAnalyzer to be public.

Are you interested in those kind of changes? Do you agree with them?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message