Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 11213 invoked from network); 8 Oct 2009 12:59:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Oct 2009 12:59:45 -0000 Received: (qmail 49194 invoked by uid 500); 8 Oct 2009 12:59:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49114 invoked by uid 500); 8 Oct 2009 12:59:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49104 invoked by uid 99); 8 Oct 2009 12:59:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2009 12:59:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dawid.weiss@gmail.com designates 72.14.220.157 as permitted sender) Received: from [72.14.220.157] (HELO fg-out-1718.google.com) (72.14.220.157) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2009 12:59:33 +0000 Received: by fg-out-1718.google.com with SMTP id 16so1759275fgg.5 for ; Thu, 08 Oct 2009 05:59:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=lsFi1WlXzs1n7WZivEkby4J9EUlL6qkNPSyztVR0HXk=; b=Cv8Xu3yfcMST9HDulWnbCGVamyQx6lUBXkZWowHcqb/DgREZPsHnc2jm9JaqJKXOR4 TRe3jQd4VI6hKzy2JgzGA/6CMi5wWwPIVOmzpWLfZ/Fp/etgID1QfwdYqGRW7DDRCwU8 uv4pTYkHLI1/9NWRcP+qzZ7NWQa5+qUlzn3/k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=gWmwiDj9UlBpwyQNOt7SdKUMADjz6CpcJo2cot+qtdJSlnw9Gs95Hwakf8kvkjeGyM Lz0XwtsclXARdV2L83h2KfkyGONng7qFS8RWluIP5CLdQ1t+EM6wuRN990cbS2vmSR1N lKgkrfiglRVCr5cRBKADdtn2j8PRi2qOZDZw0= MIME-Version: 1.0 Received: by 10.86.8.36 with SMTP id 36mr1099112fgh.7.1255006750523; Thu, 08 Oct 2009 05:59:10 -0700 (PDT) In-Reply-To: References: Date: Thu, 8 Oct 2009 14:59:10 +0200 Message-ID: Subject: Re: Reverse stemmer? From: Dawid Weiss To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Stemmers are heuristic transformations aiming at reducing the vocabulary's dimensionality (and for other purposes I don't want to discuss here). For accurate transformations one would use a lemmatization engine (typically dictionary-driven) combined with morphological analysis for ambiguity resolution. So, stemming should be perceived as a "one-way" transformation from inflected forms to some form of a unique identifier for a common lemma (a set of word forms with identical meaning). I don't know if you can call it a "reverse stemmer", but there are tools for generating inflected forms of lemmas (let's call them "root words") given the morphological tag or annotation. This is particularly useful for languages with rich inflection paradigms (so that you can construct grammatically correct sequences of words). One example of such a project is Morfologik: http://morfologik.blogspot.com/ Like Erick mentioned, though, this is probably far from what you actually need... Dawid On Tue, Oct 6, 2009 at 9:31 AM, David Leangen wrote: > > Hello, > > I've been using Lucene in a very basic way for some time now, and I'm > starting to take advantage of some of the linguistic capabilities only now. > > I am making use of the snowball analyzer for stemming, and it works very > well. > > > Question: is there any such thing as a "reverse stemmer"? In other words, > given the stem of a word, is there any algorithm to find the original word? > Or is this just fantasy? ;-) > > Now, I understand that there is a 1:n mapping of stems:words. I can deal > with that. > > > Thanks! > =David > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org