lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <daniel.na...@intrafind.de>
Subject Re: Analyzing and Querying
Date Fri, 06 Aug 2004 12:04:21 GMT
On Friday 06 August 2004 13:28, Magnus Johansson wrote:

> Splitting compound words can be done quite effectively simply by using
> a large wordlist. I have done this for swedish.

It is, however, difficult to get right for German. On the one hand there are 
compounds in German with more than two parts, on the other hand there are 
extra characters in the middle of some compound words (e.g. Arbeit + Aufwand 
= ArbeitSaufwand). Also, the compounds have their inflectional endings, e.g. 
the plural of Bergbahn is Bergbahnen. At http://lemmi.intrafind.org you can 
see a demo that deals with almost all cases, even things like "dazugekauftes" 
(but it's not freely available).

Regards
 Daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message