lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Reverse stemmer?
Date Thu, 08 Oct 2009 12:59:10 GMT
Stemmers are heuristic transformations aiming at reducing the
vocabulary's dimensionality (and for other purposes I don't want to
discuss here). For accurate transformations one would use a
lemmatization engine  (typically dictionary-driven) combined with
morphological analysis for ambiguity resolution. So, stemming should
be perceived as a "one-way" transformation from inflected forms to
some form of a unique identifier for a common lemma (a set of word
forms with identical meaning).

I don't know if you can call it a "reverse stemmer", but there are
tools for generating inflected forms of lemmas (let's call them "root
words") given the morphological tag or annotation. This is
particularly useful for languages with rich inflection paradigms (so
that you can construct grammatically correct sequences of words). One
example of such a project is Morfologik:

Like Erick mentioned, though, this is probably far from what you
actually need...


On Tue, Oct 6, 2009 at 9:31 AM, David Leangen <> wrote:
> Hello,
> I've been using Lucene in a very basic way for some time now, and I'm
> starting to take advantage of some of the linguistic capabilities only now.
> I am making use of the snowball analyzer for stemming, and it works very
> well.
> Question: is there any such thing as a "reverse stemmer"? In other words,
> given the stem of a word, is there any algorithm to find the original word?
> Or is this just fantasy? ;-)
> Now, I understand that there is a 1:n mapping of stems:words. I can deal
> with that.
> Thanks!
> =David
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message