lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: ICU Chinese words
Date Sun, 24 Apr 2011 03:45:52 GMT
2011/4/23 Weiwei Wang <ww.wang.cs@gmail.com>:
> hi,all
>      I'm working on a Chinese contact search project, I need to transform
> the Chinese words to its Pinyin form.
>
> e.g.
>  中国--> zhongguo
>
> The problem I encounter is that for some chinese words which have more than
> one transforms, like. 贾-> jia, 贾->gu, ...
>
> I already used the ICUTransformFilter(Han->Latin/Names),how could i get all
> the transforms instead just one of them?
>

Maybe use the unihan database (e.g. generate synonyms or something
from it, or make a special filter) ?

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E8%B4%BE
kMandarin	JIA3 GU3 JIA4

you can download this as a zip file.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message