lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: stemming in Lucene
Date Wed, 02 Apr 2008 11:56:21 GMT
Wojtek H a écrit :
> Hi all,
>
> Snowball stemmers are part of Lucene, but for few languages only. We
> have documents in various languages and so need stemmers for many
> languages (in particular polish). One of the ideas is to use ispell
> dictionaries. There are ispell dicts for many languages and so this
> solution is good for multilingual environment. Maybe this is not
> perfect place to ask, but does anyone know about java stemmer using
> ispell dicts?
> There is aspell-like java spell-checker (Jazzy) but I could not see
> how to use it for stemming. We are considering porting part of
> postgres tsearch module to java, because tsearch uses ispell dicts for
> stemming.
> But maybe there is a better way or there are people working on
> something like that?
>   
ispell data is nice for phonetic, and for enumerate a huge list of 
words. The ispell dictionnary is one way : pseudo root => word, it looks 
hard to build the inverse function, lemme is splitted in multiple affix. 
But it can be used to find rules, just like 
http://www.getopt.org/stempel/ do.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message