lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Kstem vs. Snowball? -- Re: Porter Stemmer
Date Tue, 02 Mar 2004 00:25:15 GMT
Erik Hatcher wrote:

> On Feb 24, 2004, at 12:33 PM, Michael McGrady wrote:
>> This conversation is a mystery to me.  Is there some different Porter 
>> stemmer than the one available in the Lucene source code?
> Yes.  As mentioned, the snowball analyzer family lives in the sandbox.  
> The CVS repository is jakarta-lucene-sandbox - look under 
> contributions/snowball for more details.  Dr. Porter's website contains 
> details on why he developed snowball over the original Porter stemmer.

Out of curiosity can anyone comment on how Snowball compares with KStem, 
which appeared on the mailing list around this thread:

Also, I thought I read somewhere about new stemmers existing that can 
return multiple stems for a word - but on examination neither KStem nor 
Snowball seem to fit this description. Memory fault?

>     Erik
>> At 09:03 AM 2/24/2004, you wrote:
>>> On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote:
>>>> Is there any reason why the PorterStemmer can't be made public?  I 
>>>> know several people have submitted this patch, both separately and 
>>>> as part of other patches.  I, for one, am using it in other places 
>>>> as part of my overall search solution and I bet others are as well.  
>>>> I guess I could understand if all stemmers were that way, but the 
>>>> GermanStemmer is publicly available, so it doesn't seem to be 
>>>> consistent.
>>>> Just wondering...
>>> I think we can make it public.  But an alternative is to use the 
>>> snowball code in the sandbox, which has a public PorterStemmer.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message