Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 943 invoked from network); 2 Mar 2004 00:29:33 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 2 Mar 2004 00:29:33 -0000 Received: (qmail 78916 invoked by uid 500); 2 Mar 2004 00:29:16 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 78892 invoked by uid 500); 2 Mar 2004 00:29:16 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 78876 invoked from network); 2 Mar 2004 00:29:15 -0000 Received: from unknown (HELO mta.micromuse.com) (194.131.185.92) by daedalus.apache.org with SMTP; 2 Mar 2004 00:29:15 -0000 Received: from tropo.com ([10.0.0.157]) (authenticated (0 bits)) by mta.micromuse.com (Switch-2.2.8/Switch-2.2.8) with ESMTP id i220TKH22085 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified NO) for ; Tue, 2 Mar 2004 00:29:22 GMT Message-ID: <4043D560.4080001@tropo.com> Date: Mon, 01 Mar 2004 16:29:20 -0800 From: David Spencer User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Kstem vs. Snowball? -- Re: Porter Stemmer References: <5B19E2A8-66EB-11D8-8122-000393A564E6@ehatchersolutions.com> <6.0.1.1.0.20040224093243.01e5aef8@mail.michaelmcgrady.com> <2A0D0BAD-66FF-11D8-8122-000393A564E6@ehatchersolutions.com> In-Reply-To: <2A0D0BAD-66FF-11D8-8122-000393A564E6@ehatchersolutions.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Erik Hatcher wrote: > On Feb 24, 2004, at 12:33 PM, Michael McGrady wrote: > >> This conversation is a mystery to me. Is there some different Porter >> stemmer than the one available in the Lucene source code? > > > Yes. As mentioned, the snowball analyzer family lives in the sandbox. > The CVS repository is jakarta-lucene-sandbox - look under > contributions/snowball for more details. Dr. Porter's website contains > details on why he developed snowball over the original Porter stemmer. Out of curiosity can anyone comment on how Snowball compares with KStem, which appeared on the mailing list around this thread: http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg03740.html Also, I thought I read somewhere about new stemmers existing that can return multiple stems for a word - but on examination neither KStem nor Snowball seem to fit this description. Memory fault? > > Erik > >> >> At 09:03 AM 2/24/2004, you wrote: >> >>> On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote: >>> >>>> Is there any reason why the PorterStemmer can't be made public? I >>>> know several people have submitted this patch, both separately and >>>> as part of other patches. I, for one, am using it in other places >>>> as part of my overall search solution and I bet others are as well. >>>> I guess I could understand if all stemmers were that way, but the >>>> GermanStemmer is publicly available, so it doesn't seem to be >>>> consistent. >>>> >>>> Just wondering... >>> >>> >>> I think we can make it public. But an alternative is to use the >>> snowball code in the sandbox, which has a public PorterStemmer. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org