lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
Date Sun, 07 Oct 2012 02:01:02 GMT


Christian Moen commented on LUCENE-3922:

Is it difficult to support numbers with period as the following?

Supporting this is no problem and a good idea.

I think It would be helpful that this charfilter supports old Kanji numeric characters ("KYU-KANJI"
or "DAIJI") such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 (Three), or configureable.

This is also easy to support.

As for making preserving zeros configurable, that's also possible, of course.

It's great to get more feedback on what sort of functionality we need and what should be configurable
options. Hopefully, we can find a good balance without adding too much complexity.

Thanks for the feedback.
> Add Japanese Kanji number normalization to Kuromoji
> ---------------------------------------------------
>                 Key: LUCENE-3922
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.0-ALPHA
>            Reporter: Kazuaki Hiraga
>              Labels: features
>         Attachments: LUCENE-3922.patch
> Japanese people use Kanji numerals instead of Arabic numerals for writing price, address
and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December).
 So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we
need to have a capability to normalize to Kanji numerals).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message