lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Hütter>
Subject Re: Lucene and Eastern languages (Japanese, Korean and Chinese)
Date Wed, 25 Jul 2007 09:26:48 GMT
Mathieu Lecarme schrieb:
> Le mardi 24 juillet 2007 à 13:01 -0700, Shaw, James a écrit :
>> Hi, guys,
>> I found Analyzers for Japanese, Korean and Chinese, but not stemmers;
>> the Snowball stemmers only include European languages.  Does stemming
>> not make sense for ideograph-based languages (i.e., no stemming is
>> needed for Japanese, Korean and Chinese)?
> No.

This not quite correct, Chinese doesn't need any stemming but Japanese
is not completely ideograph-based and it could use stemming. I doubt
anyone has done this, besides some commercial software for the japanese
market. I don't know for Korean.

>> Also for spell checking, does the default Lucene SpellChecker work for
>> Japanese, Korean and Chinese?  Does edit distance make sense for these
>> languages?
> Japanese used group of ideogram, but levenstein distance don't make
> sense with few letters but I'm not a CJK expert.
> M.

Edit distance only seems to work with latin character based (writen)
languages. Spell checking Chinese, Japanese (and Korean?) is more or
less pointless, as they are inputed using input methods, which should
produce "correct" words.

Best regards,


Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message