lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Stromnov <strom...@gmail.com>
Subject Re: Stemmer bug?
Date Tue, 10 Jul 2007 22:12:53 GMT

Hi

RussianAnalyzer produces russian stemmed forms, but
SnowballPorterFilterFactory with language="Russian" leaves _all_ russian
content unchanged.


hossman wrote:
> 
> 
> : Subject: Stemmer bug?
> 
> can you elaborate on what exactly you view as a bug?
> 
> if the issue is just that one of the examples stemms something in a way
> thta you think makes sense, but the other one does not that really isn't a
> bug so much as it is a comment on the effectiveness of the Snowball
> Stemmer for Russian vs the RussianStemmer class used by the
> RussianAnalzer.  if you like the stemming that comes out of hte
> RussianAnalyzer you can use the RussianStemFilter yourslf by creating a
> simple FilterFactory arround it (there are lots of examples in teh Solr
> code base)
> 
> Also keep in mind that the Snowball Stemmer is not designed to produce
> "real" words when it stems ... it's an algorithmic stemmer designed to
> produce artificial stems for common cases ... so if you think it's a bug
> because it produces terms that aren't real words -- it's not, that's just
> the way it works -- what matters is that it produces the same artificaial
> stem for related words.
> 
> -Hoss
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11530601
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message