Swedish is similar. Compound words can be formed by multiple words.
Sometimes
words are joined with an 's' and sometimes not, and there are a few
special cases.
Didn't mean to suggest that it is trivial to implement, but a large
wordlist with inflections
is a good start.
/magnus
Daniel Naber wrote:
>On Friday 06 August 2004 13:28, Magnus Johansson wrote:
>
>
>
>>Splitting compound words can be done quite effectively simply by using
>>a large wordlist. I have done this for swedish.
>>
>>
>
>It is, however, difficult to get right for German. On the one hand there are
>compounds in German with more than two parts, on the other hand there are
>extra characters in the middle of some compound words (e.g. Arbeit + Aufwand
>= ArbeitSaufwand). Also, the compounds have their inflectional endings, e.g.
>the plural of Bergbahn is Bergbahnen. At http://lemmi.intrafind.org you can
>see a demo that deals with almost all cases, even things like "dazugekauftes"
>(but it's not freely available).
>
>Regards
> Daniel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|