lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: [PATCH] Bug on BrazilianAnalyzer
Date Tue, 02 Dec 2008 12:05:26 GMT

Rafael,

Could you work these changes into a patch,  add a test case, and open  
a Jira issue?  Maybe first make the simple fixes (removing final,  
moving LowerCaseFilter up in the chain), and then as a 2nd issue this  
deeper refactoring of all StemFilters?  Thanks.

I agree the original issue (LowerCaseFilter coming after StopFilter)  
is a bug, though does the BrazilianStemFilter mind if all tokens  
coming it are now lowercased (I would assume not)?

Mike

Adriano Crestani wrote:

> Hi Rafael,
>
> I kind of agree with you. Practically all the StemFilters have the  
> same logic, they might be combined into only one class. All  
> StemFilters seems to have a setStemmer already, we could keep that  
> and also allow to pass the stemmer as a constructor paramenter, like  
> you said. I think you can create a JIRA and  submit a patch for  
> that, let's see what the lucene member will think about it  :)
>
> Now, about the BrazilianAnalyzer being final, it's probably only  
> because they wanted to increase the runtime performance, as long as  
> final classes are faster once the JVM does not need to check for  
> subclassing.
>
> Best Regards,
> Adriano Crestani Campos
>
> On Fri, Nov 21, 2008 at 2:15 PM, Rafael Cunha de Almeida <almeidaraf@gmail.com 
> > wrote:
> On Fri, 21 Nov 2008 16:46:30 -0200
> Rafael Cunha de Almeida <almeidaraf@gmail.com> wrote:
>
> > On Mon, 17 Nov 2008 19:58:47 -0800
> > "Adriano Crestani" <adrianocrestani@apache.org> wrote:
> >
> > > Hi Rafael,
> > >
> > > What is your scenario?
> > >
> > > Maybe it was defined this way so it do not filter uppercased  
> stop words.
> > > Like, for example, the downcased word "se" is a stopword, but  
> the uppercased
> > > "SE" stands for "Sergipe", a brazilian state, so it should not  
> be filtered.
> >
> > Suppose you are right, but passing it through the LowerCaseFilter  
> can
> > be useful too, specially if you don't care much about those corner
> > cases (the GermanAnalyzer, for instance, passes through
> > LowerCaseFilter first). The class being final doesn't allow to  
> inherit
> > from it and make the changes if one needs to, which is  
> unfortunate :-(.
> >
> > I would like to see a change in this whole stemmer's and language
> > analyzer's API in order to make it more flexible and extensible. The
> > way it is you have to use them in that predeterminaded way.
> >
> > It would be nice if there was only one StemFilter, a Stemmer  
> interface
> > and all Stemmers were subclasses of that. Then, the StemFilter  
> should
> > get its Stemmer as a constructor parameter. I see no reason for
> > BrazilianAnalyzer to be public.
>
> To be final, sorry. I was a bit tired when I wrote all that.
>
> > Are you interested in those kind of changes? Do you agree with them?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message