Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 823F26249 for ; Wed, 1 Jun 2011 16:25:32 +0000 (UTC) Received: (qmail 87102 invoked by uid 500); 1 Jun 2011 16:25:31 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 87040 invoked by uid 500); 1 Jun 2011 16:25:31 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 87033 invoked by uid 99); 1 Jun 2011 16:25:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 16:25:31 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 16:25:28 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A772DEEB95 for ; Wed, 1 Jun 2011 16:24:47 +0000 (UTC) Date: Wed, 1 Jun 2011 16:24:47 +0000 (UTC) From: "Yonik Seeley (JIRA)" To: dev@lucene.apache.org Message-ID: <81326900.60170.1306945487682.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-152: -------------------------------- Attachment: lucid_kstem.tgz OK folks, here's Lucid's optimized version of kstemmer. Changes by Lucid to the original kstemmer are being contributed under the ASL. This is not a patch, but simply a tarball of Lucid's version. Not sure what we want to do with some of the stuff (like the biggish test files). IIRC, there were two types of optimizations... one type was efficiency (i.e. using CharArrMap, directly using a char[] in the stemmer, etc). Other optimizations actually changed the logic and code paths though, which is one reason I tested it over a whole document to ensure it still matched the original. > [PATCH] KStem for Lucene > ------------------------ > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Affects Versions: unspecified > Environment: Operating System: other > Platform: Other > Reporter: Otis Gospodnetic > Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org