lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Which stemmer?
Date Thu, 15 Nov 2012 16:38:39 GMT
I'd make it easy for myself. Generate (programmatically), a list like you
showed for a _lot_ more terms, send it to your customer, and let _them_
pick. Unfortunately, the customer has no idea what "aggressive" means (for
that matter, I don't know how porter handles specific words for that
matter, I always have to try it). By putting concrete examples in front of
them, and framing it with "all the words that reduce to the same stem will
be considered matches and return" you can give them enough info to make a
choice.

FWIW,
Erick


On Wed, Nov 14, 2012 at 9:11 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> Another word set to try: invest, investing, investment, investments,
> invests, investor, invester, investors, investers.
>
> Also, take a look at EnglishMinimalStemmer (**
> EnglishMinimalStemFilterFactor**y) for minimal stemming.
>
> See:
> http://lucene.apache.org/core/**4_0_0/analyzers-common/org/**
> apache/lucene/analysis/en/**EnglishMinimalStemFilterFactor**y.html<http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html>
> http://lucene.apache.org/core/**4_0_0/analyzers-common/org/**
> apache/lucene/analysis/en/**EnglishMinimalStemmer.html<http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html>
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Scott Smith
> Sent: Wednesday, November 14, 2012 5:17 PM
> To: java-user@lucene.apache.org
> Subject: RE: Which stemmer?
>
>
> Unfortunately, my "use case" is a customer who wants stemming, but has
> very little knowledge of what that means except they think they want it.
>
> I agree with your last comment.  So, here's my contribution:
>
>  Original      porter       kstem     minStem
>   -------     -------     -------     -------
>   country     countri     country     country
>       run         run         run         run
>      runs         run        runs         run
>   running         run     running     running
>      read        read        read        read
>   reading        read     reading     reading
>    reader      reader      reader      reader
> association     associ association association
> associate      associ   associate   associate
>   listing        list        list     listing
>     water       water       water       water
>   watered       water       water     watered
>      sure        sure        sure        sure
>    surely        sure      surely      surely
>    fred's       fred'      fred's       fred'
>     roses        rose        rose        rose
>
> Still not sure which one to pick.  Porter is more aggressive.  Min stemmer
> is pretty minimal.  Perhaps the kstemmer is "just right" :-)
>
> Cheers
>
> Scott
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.**com<jack@basetechnology.com>
> ]
> Sent: Wednesday, November 14, 2012 4:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: Which stemmer?
>
> What is your use case? If you don't have a specific use case in mind, try
> each of them with some common words that you expect will or won't be
> stemmed. If you have Solr, you can experiment interactively using the Solr
> Admin Analysis web page.
>
> It would be nice if the javadoc for each stemmer gave a handful of
> examples that illustrated how some common words are stemmed.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Scott Smith
> Sent: Wednesday, November 14, 2012 10:55 AM
> To: java-user@lucene.apache.org
> Subject: Which stemmer?
>
> Does anyone have any experience with the stemmers?  I know that Porter is
> what "everyone" uses.  Am I better off with KStemFilter (better
> performance) or ??  Does anyone understand the differences between the
> various stemmers and how to choose one over another?
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message