Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 82873 invoked from network); 20 Nov 2005 17:54:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Nov 2005 17:54:02 -0000 Received: (qmail 55060 invoked by uid 500); 20 Nov 2005 17:53:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55024 invoked by uid 500); 20 Nov 2005 17:53:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55013 invoked by uid 99); 20 Nov 2005 17:53:57 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Nov 2005 09:53:57 -0800 Received-SPF: pass (asf.osuosl.org: domain of lucenelist@danielnaber.de designates 80.67.18.16 as permitted sender) Received: from [80.67.18.16] (HELO smtprelay04.ispgateway.de) (80.67.18.16) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Nov 2005 09:55:30 -0800 Received: (qmail 19707 invoked from network); 20 Nov 2005 17:53:34 -0000 Received: from unknown (HELO p5496feb1.dip.t-dialin.net) ([pbs]695637@[84.150.254.177]) (envelope-sender ) by smtprelay04.ispgateway.de (qmail-ldap-1.03) with RC4-MD5 encrypted SMTP for ; 20 Nov 2005 17:53:34 -0000 From: Daniel Naber To: java-user@lucene.apache.org Subject: Re: What is stemming? Date: Sun, 20 Nov 2005 18:55:46 +0100 User-Agent: KMail/1.8.2 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200511201855.46600@danielnaber.de> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Sonntag 20 November 2005 16:48, Koji Sekiguchi wrote: > Could someone explain what "stemming" is? Stemming usually means to cut off characters from the end of the word, e.g. walked -> walk, walking -> walk. However, this does not necessarily produce a real word, e.g. a stemmer could also change house and houses to "hous". Also, cutting of characters isn't enough for irregular words, e.g. you cannot get from "went" to "go" by just cutting of characters. A lemmatizer solves these problems, i.e. it always produces real words, even for irregular forms. It usually needs a table of irregular forms for this. Regards Daniel -- http://www.danielnaber.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org