lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kazuaki Hiraga (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
Date Sat, 06 Oct 2012 16:59:03 GMT


Kazuaki Hiraga commented on LUCENE-3922:

Sorry for this late reply.

Although I have some request to improve capability, this is very helpful and nice charfilter
for me.
Thank you! Christian!!

My requests are the following:

Is it difficult to support numbers with period as the following?

On the other hand, I agree with Christian to not preserving leading zeros. So, "◯◯七"
doesn't need to become "007".

I think It would be helpful that this charfilter supports old Kanji numeric characters ("KYU-KANJI"
or "DAIJI") such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 (Three), or configureable.
> Add Japanese Kanji number normalization to Kuromoji
> ---------------------------------------------------
>                 Key: LUCENE-3922
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.0-ALPHA
>            Reporter: Kazuaki Hiraga
>              Labels: features
>         Attachments: LUCENE-3922.patch
> Japanese people use Kanji numerals instead of Arabic numerals for writing price, address
and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December).
 So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we
need to have a capability to normalize to Kanji numerals).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message