lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: German*Filter, Analyzer "cutting" off letters from (french) words...
Date Wed, 13 Apr 2011 08:51:12 GMT
On Wed, Apr 13, 2011 at 9:51 AM, Clemens Wyss <clemensdev@mysign.ch> wrote:
> What I really want to do is ignore german stop words such as "der", "die", "das", "ein",...

GermanAnalyzer takes a stemExclusionSet if you put those terms into
this set the stemmer will not touch them. This should be in 3.1 I
think

public GermanAnalyzer(Version matchVersion, Set<?> stopwords, Set<?>
stemExclusionSet)

simon

>
>> -----Urspr√ľngliche Nachricht-----
>> Von: Robert Muir [mailto:rcmuir@gmail.com]
>> Gesendet: Dienstag, 12. April 2011 17:03
>> An: java-user@lucene.apache.org
>> Betreff: Re: German*Filter, Analyzer "cutting" off letters from (french)
>> words...
>>
>> On Tue, Apr 12, 2011 at 8:46 AM, Clemens Wyss <clemensdev@mysign.ch>
>> wrote:
>> > Why so? Where have the e's gone?
>> >
>>
>> the e is being stemmed as its a german suffix... all of the german stemming
>> algorithms remove final -e, as do all the french stemming algorithms.
>>
>> so i don't understand your problem.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message