lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Which stemmer?
Date Thu, 15 Nov 2012 02:11:24 GMT
Another word set to try: invest, investing, investment, investments, 
invests, investor, invester, investors, investers.

Also, take a look at EnglishMinimalStemmer (EnglishMinimalStemFilterFactory) 
for minimal stemming.

See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html

-- Jack Krupansky

-----Original Message----- 
From: Scott Smith
Sent: Wednesday, November 14, 2012 5:17 PM
To: java-user@lucene.apache.org
Subject: RE: Which stemmer?

Unfortunately, my "use case" is a customer who wants stemming, but has very 
little knowledge of what that means except they think they want it.

I agree with your last comment.  So, here's my contribution:

  Original      porter       kstem     minStem
   -------     -------     -------     -------
   country     countri     country     country
       run         run         run         run
      runs         run        runs         run
   running         run     running     running
      read        read        read        read
   reading        read     reading     reading
    reader      reader      reader      reader
association     associ association association
associate      associ   associate   associate
   listing        list        list     listing
     water       water       water       water
   watered       water       water     watered
      sure        sure        sure        sure
    surely        sure      surely      surely
    fred's       fred'      fred's       fred'
     roses        rose        rose        rose

Still not sure which one to pick.  Porter is more aggressive.  Min stemmer 
is pretty minimal.  Perhaps the kstemmer is "just right" :-)

Cheers

Scott

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Wednesday, November 14, 2012 4:14 PM
To: java-user@lucene.apache.org
Subject: Re: Which stemmer?

What is your use case? If you don't have a specific use case in mind, try 
each of them with some common words that you expect will or won't be 
stemmed. If you have Solr, you can experiment interactively using the Solr 
Admin Analysis web page.

It would be nice if the javadoc for each stemmer gave a handful of examples 
that illustrated how some common words are stemmed.

-- Jack Krupansky

-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
To: java-user@lucene.apache.org
Subject: Which stemmer?

Does anyone have any experience with the stemmers?  I know that Porter is 
what "everyone" uses.  Am I better off with KStemFilter (better performance) 
or ??  Does anyone understand the differences between the various stemmers 
and how to choose one over another?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message