lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Johansson <mag...@technohuman.com>
Subject Re: Analyzing and Querying
Date Fri, 06 Aug 2004 12:38:42 GMT
Swedish is similar. Compound words can be formed by multiple words. 
Sometimes
words are joined with an 's' and sometimes not, and there are a few 
special cases.

Didn't mean to suggest that it is trivial to implement, but a large 
wordlist with inflections
is a good start.

/magnus


Daniel Naber wrote:

>On Friday 06 August 2004 13:28, Magnus Johansson wrote:
>
>  
>
>>Splitting compound words can be done quite effectively simply by using
>>a large wordlist. I have done this for swedish.
>>    
>>
>
>It is, however, difficult to get right for German. On the one hand there are 
>compounds in German with more than two parts, on the other hand there are 
>extra characters in the middle of some compound words (e.g. Arbeit + Aufwand 
>= ArbeitSaufwand). Also, the compounds have their inflectional endings, e.g. 
>the plural of Bergbahn is Bergbahnen. At http://lemmi.intrafind.org you can 
>see a demo that deals with almost all cases, even things like "dazugekauftes" 
>(but it's not freely available).
>
>Regards
> Daniel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message