lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Stemming stop words with GermanAnalyzer (Lucene 1.3RC1)
Date Mon, 13 Oct 2003 13:43:22 GMT
Maxim,

It doesn't make sense to first stem words, and then remove the stop
words.  The current order, stop word filter followed by the stemmer, is
the correct one.  You could try using the SnowballAnalyzer with the
German stemmer.  You can find those in Lucene Sandbox.

Otis

--- Maxim Patramanskij <max@osua.de> wrote:
> Hello all.
> 
> I've found the following behavior when using GermanAnalyzer (both
> indexing and searching) with Lucene 1.3RC1 library.
> 
> When I tried to search for 'sein' (without quotes) I've got no hits,
> cause 'sein' is one of the German stop words.
> 
> But searching for 'seiner', which is the form of 'sein', brings some
> hits. I'm not an expert in German language, but may be the
> StopFilter of GermanAnalyzer should be applied after GermanStemFilter
> and thus, changed forms of stop words also won't become into index:
> 
> public TokenStream tokenStream( String fieldName, Reader reader )
>     {
>         TokenStream result = new StandardTokenizer( reader );
>         result = new StandardFilter( result );
>        // result = new StopFilter( result, stoptable );  original  
> code
>         result = new GermanStemFilter( result, excltable );
>         result = new StopFilter( result, stoptable );
>         return result;
>     }
> 
> or just add these changed forms to the stop words list?
> 
> 
> Max


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message