lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Legrand" <steve...@hotmail.com>
Subject Re: Snowball Java EnglishStemmer: Porter or Porter2?
Date Mon, 23 May 2005 23:20:29 GMT
Thanks, Eric

I debugged my code and noticed that I had indexed one set of my files using 
the older PorterAnalyzer and did the search with the SnowballAnalyzer. Now I 
have the Snowball´s Porter algorithm (net.sf.snowball)  in both indexing and 
search in all the file sets and everything works fine.

Cheerio, Steve

Steve Legrand

>
>On May 22, 2005, at 1:53 PM, Steve Legrand wrote:
>
>>Does the java-version of Snowball employ Porter or Porter2 stemming  
>>algorithm in its EnglishStemmer available from the Lucene Sandbox?  If it 
>>is Porter2, I should get the word "his" indexed as "his" not  as "hi" as 
>>it does at the moment.
>
>I don't know the specifics of which algorithm, but there are three  
>different SnowballAnalyzer stemmers for English - "English", "Lovins"  and 
>"Porter.  I just ran each of the English stemmers with the  AnalyzerDemo 
>and got this output analyzing the string "his hiss  history":
>
>   SnowballAnalyzer:  // English
>     [his] [hiss] [histori]
>
>   SnowballAnalyzer:  // Lovins
>     [his] [his] [history]
>
>   SnowballAnalyzer:  // Porter
>     [hi] [hiss] [histori]
>
>Only the "Lovins" one does what seems to be the right thing with  "his", 
>except that it does a bad job with words like "country" and  "countries".
>
>     Erik
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Mime
View raw message