lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <>
Subject Re: Analysis/tokenization of compound words
Date Wed, 20 Sep 2006 07:07:18 GMT
On Tuesday 19 September 2006 22:15, eks dev wrote:

> Daniel Naber made some work with German dictionaries as well, if I
> recall well, maybe he has something that helps

The company I work for offers a commercial Java component for decomposing 
and lemmatizing German words, see for an 
online demo (sorry, page is in German only).

Writing a decomposer is difficult as you need both a large dictionary 
*without* compounds and a set of rules to avoid splitting at too many 
positions. For those who speak German: write a decomposer and use 
"Kotfl├╝gel" to test it :-)



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message