lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <>
Subject RE: Problems when changing stoplist file
Date Thu, 11 Sep 2008 16:00:04 GMT
Hi Marie,

On 09/11/2008 at 4:03 AM, Marie-Christine Plogmann wrote:
> I am currently using the demo class IndexFiles to index some
> corpus. I have replaced the Standard by a GermanAnalyzer.
> Here, indexing works fine.
> But if i specify a different stopword list that should be
> used, the tokenization doesn't seem to work properly. Mostly
> some letters are missing at the end. Has somebody encountered
> a similar problem? What could be the problem?

Are you sure that this only occurs after you change the stopword list?

I assume you're using the GermanAnalyzer in contrib/; it constructs an analysis pipeline consisting
of StandardTokenizer, StandardFilter, LowerCaseFilter, StopFilter, and then  GermanStemFilter,
which invokes GermanStemmer <>,
which is an implementation of the stemming algorithm described in the paper linked from here:

A basic question to get out of the way: Are you aware that the stemming operation removes
letters from the end of some words?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message