lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Vlček <lukas.vl...@gmail.com>
Subject Re: KStem custom lexicons configuration possible?
Date Mon, 20 Jun 2011 12:23:36 GMT
Hi Robert,

this sounds interesting I will look at it in more detail.

However, I do not think this is really a general solution. If I understand
StemmerOverrideFilter correctly (from a quick glance) it rely on the fact
that you *know* exact term (the key in the map) in advance. In other words
if I wanted to "fix" some term produced by Kstem filter I would have to know
what is the product of the stemming in advance. Now, this means that if I
switch to snowball or porter or other stemmer instead of KStem or simply
update something else in the filtering chain then I am in trouble. Also if I
understand correctly the original KStem implementation it can still get
updates to lexicons which means that once these updates are ported to Java
implementation it can again result in problem with existing override filter
setup.

More generally, is there any reason why lexicons are not configurable in
KStem filter?

Regards,
Lukas

On Mon, Jun 20, 2011 at 1:38 PM, Robert Muir <rcmuir@gmail.com> wrote:

> On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček <lukas.vlcek@gmail.com>
> wrote:
> > Having an option to modify internal lexicons I would be able to adapt the
> > KStem to work better for specific text corpora.
> > What do you think?
>
> please use StemmerOverrideFilter for this! it works with all stemmers,
> including this one.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message