lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <soko...@ifactory.com>
Subject Re: Which stemmer?
Date Thu, 15 Nov 2012 01:00:49 GMT

>
> Does anyone have any experience with the stemmers?  I know that Porter 
> is what "everyone" uses.  Am I better off with KStemFilter (better 
> performance) or ??  Does anyone understand the differences between the 
> various stemmers and how to choose one over another?
We started off using Porter, then switched to KStem since Porter is way 
too aggressive for us (you get a lot of false matches), but KStem seemed 
a little bit too conservative, so we've had to augment it with synonyms.

For example, KStem doesn't seem to reduce plurals in some cases where it 
seems it should - like "bounds" was a problem - it won't match "bound," 
even though many (most) other plurals will match their singular form, 
and verbs get reduced to their stems as well. I thought maybe this was 
because there is also a heteronym (spelled same, different word) that is 
*not* a plural or verb ("bounds" as boundary as in "out of bounds"??), 
but I'm not really sure how KStem's word lists were put together or what 
the goal was.  Maybe this was ust an oversight?

YMMV; it depends a lot on what you are trying to achieve.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message