lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Patramanskij <...@osua.de>
Subject Stemming stop words with GermanAnalyzer (Lucene 1.3RC1)
Date Thu, 17 Apr 2003 13:46:55 GMT
Hello all.

I've found the following behavior when using GermanAnalyzer (both
indexing and searching) with Lucene 1.3RC1 library.

When I tried to search for 'sein' (without quotes) I've got no hits,
cause 'sein' is one of the German stop words.

But searching for 'seiner', which is the form of 'sein', brings some
hits. I'm not an expert in German language, but may be the
StopFilter of GermanAnalyzer should be applied after GermanStemFilter
and thus, changed forms of stop words also won't become into index:

public TokenStream tokenStream( String fieldName, Reader reader )
    {
        TokenStream result = new StandardTokenizer( reader );
        result = new StandardFilter( result );
       // result = new StopFilter( result, stoptable );  original   code
        result = new GermanStemFilter( result, excltable );
        result = new StopFilter( result, stoptable );
        return result;
    }

or just add these changed forms to the stop words list?


Max


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message