lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3915) Add Japanese filter to replace term attribute with readings
Date Sat, 24 Mar 2012 21:06:25 GMT


Christian Moen commented on LUCENE-3915:

Find attached a draft patch that replaces term attributes with readings.  I saw in Ohtani-san's
Twitter feed that Koji had checked this functionality into lucene-gosen and I'm providing
a similar patch here hoping to support the Japanese spell-checking work.

This patch can also convert katakana readings to romaji and it might make sense to use a romaji
representation to do the spell-checking.  We probably also need to deal with misspellings
turning into several tokens, and that we need to recompose them using their readings before
we do matching.

Just some thoughts...
> Add Japanese filter to replace term attribute with readings
> -----------------------------------------------------------
>                 Key: LUCENE-3915
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Christian Moen
>            Priority: Minor
>         Attachments: LUCENE-3915.patch
> Koji and Robert are working on LUCENE-3888 that allows spell-checkers to do their similarity
matching using a different word than its surface form.
> This approach is very useful for languages such as Japanese where the surface form and
the form we'd like to use for similarity matching is very different.  For Japanese, it's useful
to use readings for this -- probably with some normalization.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message