lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <lucenelist2...@danielnaber.de>
Subject Re: Analysis/tokenization of compound words
Date Wed, 20 Sep 2006 07:07:18 GMT
On Tuesday 19 September 2006 22:15, eks dev wrote:

> Daniel Naber made some work with German dictionaries as well, if I
> recall well, maybe he has something that helps

The company I work for offers a commercial Java component for decomposing 
and lemmatizing German words, see http://demo.intrafind.org/LiSa/ for an 
online demo (sorry, page is in German only).

Writing a decomposer is difficult as you need both a large dictionary 
*without* compounds and a set of rules to avoid splitting at too many 
positions. For those who speak German: write a decomposer and use 
"Kotfl├╝gel" to test it :-)

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message