lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Looking for a stemmer that can return all inflected forms
Date Mon, 16 Oct 2006 15:26:44 GMT
Hi Jong,

Jong Kim wrote:
> I'm looking for a stemmer that is capable of returning all morphological 
> variants  of a query term (to be used for high-recall search). For example, 
> given a query term of 'cares', I would like to be able to generate 'cares', 
> 'care', 'cared', and 'caring'.

To achieve high recall, can't you just stem terms both when indexing and
when querying?

The only reasons I can think of not to do so are to support both
high-precision and high-recall queries with the same index, or to give
greater weight to exact-match documents than to stemmed-match documents
within a single query.  If either of these is the case, you could
maintain two fields (one stemmed and one non-stemmed) or two indexes,
and choose which field/index to use (reason #1) or combine the two in a
single query (reason #2).

Actually, now that I think about it more, two indexes for reason #2
(i.e., stemmed match as fallback if exact match fails) would be tricky,
due to issues with fusion of search results -- better to use two fields
in a single index for this case.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message