lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <angelf...@yahoo.com>
Subject Re: LucidWorks Solr
Date Mon, 19 Apr 2010 12:47:17 GMT
Thanks for the tip.

Are there any publicly available dictionary of morphologies that I could use? Or did you build
your own one?


--- On Mon, 4/19/10, Darren Govoni <darren@ontrenet.com> wrote:

> From: Darren Govoni <darren@ontrenet.com>
> Subject: Re: LucidWorks Solr
> To: solr-user@lucene.apache.org
> Date: Monday, April 19, 2010, 7:39 AM
> Regarding stemmers, I ditched them
> altogether a long time ago in favor
> of a dictionary of morphologies of all known words (for any
> given
> language). A simple lookup of any word morphology thus
> produces the set,
> including the correct stem.
> 
> Works great. 100% of the time.
> 
> Just a tip from me.
> 
> 
> On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:
> 
> > Andy, I think it is important to know what a stemmer
> really is.
> > 
> > It reduces words to their infinitves. Those
> infinitives do not refer to the
> > real infinitive everytime, but however: for the
> system, it is an infinitive,
> > since all its derivates could be reduced to the same
> form.
> > Thats a stemmer.
> > 
> > According to this, there can't exist a stemmer for
> every language, because
> > every language has got its own rules of how to reduce
> a word to its
> > infinitive.
> > 
> > If you apply a stemmer for english language on a
> german document, the
> > results might be unexpected. However, sometimes it
> still works good enough. 
> > 
> > Keep in mind that this is an algorithm. It is not
> important whether the
> > created infinitive is the real infinitive. It is only
> important that most of
> > the derivate forms can be reduced to the same basic
> form. Please ask, if
> > something is not clear.
> > 
> > KStem:
> > The wiki[1] says that KStem is less aggressive as the
> standard stemmer.
> > I guess that this means that there are more rules for
> how to reduce a word
> > to its infinitive and according to this the results
> might be better.
> > 
> > 
> > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> > 
> > Kind regards
> > - Mitch
> 
> 
> 


      

Mime
View raw message