lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3901) Add katakana filter to better deal with katakana spelling variants
Date Thu, 22 Mar 2012 10:00:22 GMT
Add katakana filter to better deal with katakana spelling variants
------------------------------------------------------------------

                 Key: LUCENE-3901
                 URL: https://issues.apache.org/jira/browse/LUCENE-3901
             Project: Lucene - Java
          Issue Type: New Feature
          Components: modules/analysis
            Reporter: Christian Moen
             Fix For: 3.6, 4.0


Many Japanese katakana words end in a long sound that is sometimes optional.

For example, パーティー and パーティ are both perfectly valid for "party".  Similarly
we have センター and センタ that are variants of "center" as well as サーバー and
サーバ for "server".

I'm proposing that we add a katakana stemmer that removes this long sound if the terms are
longer than a configurable length.  It's also possible to add the variant as a synonym, but
I think stemming is preferred from a ranking point of view.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message