lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kazuaki Hiraga (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
Date Mon, 26 Mar 2012 08:24:25 GMT
Add decompose compound Japanese Katakana token capability to Kuromoji
---------------------------------------------------------------------

                 Key: LUCENE-3921
                 URL: https://issues.apache.org/jira/browse/LUCENE-3921
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 4.0
         Environment: Cent OS 5, IPA Dictionary
            Reporter: Kazuaki Hiraga


Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese
Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed,
but it cannot be applied every Katakana compound tokens. For instance, "トートバッグ(tote
bag)" and "ショルダーバッグ" don't decompose into "トート バッグ" and "ショルダー
バッグ" although the IPA dictionary has "バッグ" in its entry.  I would like to apply
the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or
add the capability to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message