Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 21632 invoked from network); 2 Apr 2008 11:57:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Apr 2008 11:57:03 -0000 Received: (qmail 84156 invoked by uid 500); 2 Apr 2008 11:56:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 84128 invoked by uid 500); 2 Apr 2008 11:56:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 84117 invoked by uid 99); 2 Apr 2008 11:56:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Apr 2008 04:56:57 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [213.162.48.15] (HELO soufre.accelance.net) (213.162.48.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Apr 2008 11:56:03 +0000 Received: from [192.168.3.106] (LSt-Amand-152-32-31-90.w82-127.abo.wanadoo.fr [82.127.74.90]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by soufre.accelance.net (Postfix) with ESMTP id D94AE45193 for ; Wed, 2 Apr 2008 13:56:22 +0200 (CEST) Message-ID: <47F37465.70201@garambrogne.net> Date: Wed, 02 Apr 2008 13:56:21 +0200 From: Mathieu Lecarme User-Agent: Thunderbird 2.0.0.12 (X11/20080213) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: stemming in Lucene References: <329a6c830804010258jd8df420lcf0f777df9fc36f3@mail.gmail.com> In-Reply-To: <329a6c830804010258jd8df420lcf0f777df9fc36f3@mail.gmail.com> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Wojtek H a �crit : > Hi all, > > Snowball stemmers are part of Lucene, but for few languages only. We > have documents in various languages and so need stemmers for many > languages (in particular polish). One of the ideas is to use ispell > dictionaries. There are ispell dicts for many languages and so this > solution is good for multilingual environment. Maybe this is not > perfect place to ask, but does anyone know about java stemmer using > ispell dicts? > There is aspell-like java spell-checker (Jazzy) but I could not see > how to use it for stemming. We are considering porting part of > postgres tsearch module to java, because tsearch uses ispell dicts for > stemming. > But maybe there is a better way or there are people working on > something like that? > ispell data is nice for phonetic, and for enumerate a huge list of words. The ispell dictionnary is one way : pseudo root => word, it looks hard to build the inverse function, lemme is splitted in multiple affix. But it can be used to find rules, just like http://www.getopt.org/stempel/ do. M. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org