lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: KStem custom lexicons configuration possible?
Date Mon, 20 Jun 2011 12:32:16 GMT
On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček <> wrote:
> Hi Robert,
> this sounds interesting I will look at it in more detail.
> However, I do not think this is really a general solution. If I understand
> StemmerOverrideFilter correctly (from a quick glance) it rely on the fact
> that you *know* exact term (the key in the map) in advance. In other words
> if I wanted to "fix" some term produced by Kstem filter I would have to know
> what is the product of the stemming in advance. Now, this means that if I
> switch to snowball or porter or other stemmer instead of KStem or simply
> update something else in the filtering chain then I am in trouble. Also if I
> understand correctly the original KStem implementation it can still get
> updates to lexicons which means that once these updates are ported to Java
> implementation it can again result in problem with existing override filter
> setup.
> More generally, is there any reason why lexicons are not configurable in

Because we have StemmerOverrideFilter and KeywordMarkerFilter.

look at the source code to Kstem: it uses maps and sets of exceptions,
this is what these filters provide in a general way
(StemmerOverrideFilter being the map, and KeywordMarkerFilter being
the set).

we added these to work across the board with all lucene stemmers for
this reason.

I don't understand your concerns at all to be honest, they make no
sense to me. If we "updated" kstem or any other algorithm: it would
break whatever you are doing either way. A hashmap is a hashmap.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message