Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 54248 invoked from network); 19 Apr 2010 12:46:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Apr 2010 12:46:12 -0000 Received: (qmail 40210 invoked by uid 500); 19 Apr 2010 12:46:10 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 40026 invoked by uid 500); 19 Apr 2010 12:46:10 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 40018 invoked by uid 99); 19 Apr 2010 12:46:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Apr 2010 12:46:10 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [67.195.15.155] (HELO web111305.mail.gq1.yahoo.com) (67.195.15.155) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 19 Apr 2010 12:46:02 +0000 Received: (qmail 78972 invoked by uid 60001); 19 Apr 2010 12:45:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1271681140; bh=n5jslnubwTqlQ4OMZGOlkcTLaD5jMbkWbOqTqINDYJg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=z60SsGRnBOwMxGYXSXtZf5k3K3+dz7yx+91NYbbBOmiWxRTWcG9kLuO0R4vyQRQQ2Ql6NkTrvbcn+Qh1w0w6Y8U4Gu7PzSBnZ4qItduYTk57wMmndg3CVhdW7G9hMAskZZeAlRlvZceAKwbwA5JailF+0MdXrVQQueU/zyU02Nc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=x5ejcPurk5WVEdeeXbkKyH3e9QYWY6yQ4e8Z4YJplud/HpLcY1yzSMwjJ8gDr8xK605RU3MI/QVFG9U5+B7rGy3bBgshNEcHwEzHyFW4za3EbyGj+0r9psknXZMGU1a5VlNc3HfV3pcrdaiHz+WG0XqvTIj8A4C6HwzEoF1vX/w=; Message-ID: <270115.76345.qm@web111305.mail.gq1.yahoo.com> X-YMail-OSG: l9BkKUsVM1k3rULR4aaq_316yF60MBhO7XnI5hNGVvkNVSI k1cb71nrdEkti5U1jgQI5umekAmNyPriL9QDD.3ZNExxRwEPM_XQkgySI4jR 8uscFrbVlBYVQCZp.y77eFB76dpk3QYnVestQQsf_rR6Rm.2JXm8pdOBbZeB OW4UQcJ3YDW6cYPW7rwN3OKtXeSZFoYarxgTDVDJl1MM2PkwcCDxUhhELYyR bPYM1Zn2O5iIXbI9DECNKJ3uMc.vXde7h.SlvX0aA5N7jyvNdvmYU5NWVCss i.T_sHfO5aFaRZUOEAwOJxen0E6_03tsDXa2FQWGS.AIp5Fe4hb7atvHR6uJ G2MQyNs.Mwtx8hM5u6j0LgZXx1CI- Received: from [76.19.143.23] by web111305.mail.gq1.yahoo.com via HTTP; Mon, 19 Apr 2010 05:45:40 PDT X-Mailer: YahooMailClassic/10.1.9 YahooMailWebService/0.8.102.267879 Date: Mon, 19 Apr 2010 05:45:40 -0700 (PDT) From: Andy Subject: Re: LucidWorks Solr To: solr-user@lucene.apache.org In-Reply-To: <1271666193522-729110.post@n3.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the explanation Mitch. You're right. There can't be universal stemmers. What about multi-language stemmers? I'm mostly interested in English, Spanish, German, French, Italian. Are there any stemmers that would handle those languages? If not, what's the recommended way to deal with documents in multiple languages? --- On Mon, 4/19/10, MitchK wrote: > From: MitchK > Subject: Re: LucidWorks Solr > To: solr-user@lucene.apache.org > Date: Monday, April 19, 2010, 4:36 AM > > Andy, I think it is important to know what a stemmer really > is. > > It reduces words to their infinitves. Those infinitives do > not refer to the > real infinitive everytime, but however: for the system, it > is an infinitive, > since all its derivates could be reduced to the same form. > Thats a stemmer. > > According to this, there can't exist a stemmer for every > language, because > every language has got its own rules of how to reduce a > word to its > infinitive. > > If you apply a stemmer for english language on a german > document, the > results might be unexpected. However, sometimes it still > works good enough. > > Keep in mind that this is an algorithm. It is not important > whether the > created infinitive is the real infinitive. It is only > important that most of > the derivate forms can be reduced to the same basic form. > Please ask, if > something is not clear. > > KStem: > The wiki[1] says that KStem is less aggressive as the > standard stemmer. > I guess that this means that there are more rules for how > to reduce a word > to its infinitive and according to this the results might > be better. > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > Kind regards > - Mitch > -- > View this message in context: http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html > Sent from the Solr - User mailing list archive at > Nabble.com. >