lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Snowball Java EnglishStemmer: Porter or Porter2?
Date Mon, 23 May 2005 16:12:27 GMT

On May 22, 2005, at 1:53 PM, Steve Legrand wrote:

> Does the java-version of Snowball employ Porter or Porter2 stemming  
> algorithm in its EnglishStemmer available from the Lucene Sandbox?  
> If it is Porter2, I should get the word "his" indexed as "his" not  
> as "hi" as it does at the moment.

I don't know the specifics of which algorithm, but there are three  
different SnowballAnalyzer stemmers for English - "English", "Lovins"  
and "Porter.  I just ran each of the English stemmers with the  
AnalyzerDemo and got this output analyzing the string "his hiss  
history":

   SnowballAnalyzer:  // English
     [his] [hiss] [histori]

   SnowballAnalyzer:  // Lovins
     [his] [his] [history]

   SnowballAnalyzer:  // Porter
     [hi] [hiss] [histori]

Only the "Lovins" one does what seems to be the right thing with  
"his", except that it does a bad job with words like "country" and  
"countries".

     Erik


Mime
View raw message