lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <sk...@sloan.mit.edu>
Subject Re: LucidWorks Solr
Date Wed, 21 Apr 2010 17:38:22 GMT
Why do these approaches have to be mutually exclusive?
Do a dictionary lookup, if no satisfactory match found use an
algorithmic stemmer. Would probably save a few CPU cycles by
algorithmic stemming iff necessary.


On Wed, Apr 21, 2010 at 1:31 PM, Robert Muir <rcmuir@gmail.com> wrote:
> sy to look at the "faults" of some algorithmic stemmer, in truth its
> only purpose is to cause related forms of the word to conflate to the same
> form, and hopefully avoiding unrelated terms from conflating to this form.
>
> A dictionary-based stemmer is out-of-date the day you put it into
> production: languages aren't static. For example, you can't expect a
> dictionary-based stemmer to properly deal with forms like "googling" or
> "tweets" that have recently slipped into English vocabulary, but an
> algorithmic stemmer will likely deal with these just fine.

Mime
View raw message