lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Vlček <lukas.vl...@gmail.com>
Subject Re: KStem custom lexicons configuration possible?
Date Mon, 20 Jun 2011 11:19:02 GMT
May be I should show some examples where I think custom configuration can be
useful. Let me give you two examples:

1) As of now, KStem does conflation of both words "connector" and
"connected" to the same term "connect".
2) Contrary it does not do conflation of "transaction" and "transactions" to
the same term.

Having an option to modify internal lexicons I would be able to adapt the
KStem to work better for specific text corpora.

What do you think?

Regards,
Lukas

On Mon, Jun 20, 2011 at 12:55 PM, Lukáš Vlček <lukas.vlcek@gmail.com> wrote:

> Hi,
>
> Is there any API in KStem filter for lexicons configuration?
>
> As far as I understand the original code works in such a way that lexicons
> are loaded from files at startup (see
> http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz)
> names possibility to modify lexicons among advantages of KStem compared to
> other stemmers.
>
> Do people not need it? Would it be a useful addition for KStem filter to
> allow custom lexicon configurations in its API?
>
> Regards,
> Lukas
>
> Note: Big kudos to all who participated in bringing KStem into Lucene!
>

Mime
View raw message