lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <>
Subject RE: Which stemmer?
Date Thu, 15 Nov 2012 01:17:07 GMT
Unfortunately, my "use case" is a customer who wants stemming, but has very little knowledge
of what that means except they think they want it.  

I agree with your last comment.  So, here's my contribution:

  Original      porter       kstem     minStem
   -------     -------     -------     -------
   country     countri     country     country
       run         run         run         run
      runs         run        runs         run
   running         run     running     running
      read        read        read        read
   reading        read     reading     reading
    reader      reader      reader      reader
association     associ association association
 associate      associ   associate   associate
   listing        list        list     listing
     water       water       water       water
   watered       water       water     watered
      sure        sure        sure        sure
    surely        sure      surely      surely
    fred's       fred'      fred's       fred'
     roses        rose        rose        rose

Still not sure which one to pick.  Porter is more aggressive.  Min stemmer is pretty minimal.
 Perhaps the kstemmer is "just right" :-)



-----Original Message-----
From: Jack Krupansky [] 
Sent: Wednesday, November 14, 2012 4:14 PM
Subject: Re: Which stemmer?

What is your use case? If you don't have a specific use case in mind, try each of them with
some common words that you expect will or won't be stemmed. If you have Solr, you can experiment
interactively using the Solr Admin Analysis web page.

It would be nice if the javadoc for each stemmer gave a handful of examples that illustrated
how some common words are stemmed.

-- Jack Krupansky

-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
Subject: Which stemmer?

Does anyone have any experience with the stemmers?  I know that Porter is what "everyone"
uses.  Am I better off with KStemFilter (better performance) or ??  Does anyone understand
the differences between the various stemmers and how to choose one over another? 

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message