lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas" <mike.kl...@gmail.com>
Subject Re: Looking for a stemmer that can return all inflected forms
Date Sun, 15 Oct 2006 00:26:34 GMT
On 10/14/06, Jong Kim <jkim@sitescape.com> wrote:
> Hi,
>
> I'm looking for a stemmer that is capable of returning all morphological
> variants  of a query term (to be used for high-recall search). For example,
> given a query term of 'cares', I would like to be able to generate 'cares',
> 'care', 'cared', and 'caring'.
>
> I looked at the Porter stemmer, Snowball stemmer, and the K-stem.
> All of them provide a method that takes a surface string ('cares') as an
> input and returns its base form/stem, which is 'care' in this example.
> But it appears that I can not use the stemmer to generate all of the
> inflected forms of a given query term.

Stemming is a multi-step, lossy, one-way operation and  it does not
suprised me that none of these packages attempts the reverse
operation.

My suggestion is to create a reverse stemmer yourself by taking the
lexicon of your corpus, stemming all the terms, and inverting the map.
 At query time, a lookup can be performed using a trie or hashtable.

best,
-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message