lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adriano Crestani" <adrianocrest...@apache.org>
Subject Re: [PATCH] Bug on BrazilianAnalyzer
Date Tue, 18 Nov 2008 03:58:47 GMT
Hi Rafael,

What is your scenario?

Maybe it was defined this way so it do not filter uppercased stop words.
Like, for example, the downcased word "se" is a stopword, but the uppercased
"SE" stands for "Sergipe", a brazilian state, so it should not be filtered.

Best Regards,
Adriano Crestani

On Mon, Nov 17, 2008 at 3:39 PM, Rafael Cunha de Almeida <
almeidaraf@gmail.com> wrote:

> Following is the patch for what I think is a bug on the
> BrazilianAnalyzer. The default stopwords list is all in lowercase, so
> it will only work if the LowerCaseFilter comes first of if the
> StopWordFilter is set to ignore case.
>
> Since the LowerCaseFilter is instantiated anyway I just changed its
> order. If there's some problem with that order, then please consider
> setting StopWordFilter to ignore case.
>
> Index: BrazilianAnalyzer.java
> ===================================================================
> --- BrazilianAnalyzer.java      (revision 718407)
> +++ BrazilianAnalyzer.java      (working copy)
> @@ -131,10 +131,9 @@
>        public final TokenStream tokenStream(String fieldName, Reader
> reader) { TokenStream result = new StandardTokenizer( reader );
>                result = new StandardFilter( result );
> +               result = new LowerCaseFilter( result );
>                result = new StopFilter( result, stoptable );
>                result = new BrazilianStemFilter( result, excltable );
> -               // Convert to lowercase after stemming!
> -               result = new LowerCaseFilter( result );
>                return result;
>        }
>  }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message