Hi Robert,

this sounds interesting I will look at it in more detail.

However, I do not think this is really a general solution. If I understand StemmerOverrideFilter correctly (from a quick glance) it rely on the fact that you *know* exact term (the key in the map) in advance. In other words if I wanted to "fix" some term produced by Kstem filter I would have to know what is the product of the stemming in advance. Now, this means that if I switch to snowball or porter or other stemmer instead of KStem or simply update something else in the filtering chain then I am in trouble. Also if I understand correctly the original KStem implementation it can still get updates to lexicons which means that once these updates are ported to Java implementation it can again result in problem with existing override filter setup.

More generally, is there any reason why lexicons are not configurable in KStem filter?


On Mon, Jun 20, 2011 at 1:38 PM, Robert Muir <rcmuir@gmail.com> wrote:
On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček <lukas.vlcek@gmail.com> wrote:
> Having an option to modify internal lexicons I would be able to adapt the
> KStem to work better for specific text corpora.
> What do you think?

please use StemmerOverrideFilter for this! it works with all stemmers,
including this one.

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org