lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4381) support unicode 6.2
Date Wed, 12 Sep 2012 23:46:07 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-4381:
--------------------------------

    Attachment: LUCENE-4381.patch

A hacked up patch for testing:

I think its nice to offer the CJK dictionary-based stuff as an option? I'm not sure how good
results will be on average yet (maybe I can enlist Christian to help investigate).

So as a test I just added a boolean option, which if enabled, keeps all han/hiragana/katakana
marked as "Chinese/Japanese" (uses the 15924 Japanese code, but I overrode the toString to
try to prevent confusion).

Seems to work ok: some trivial snippets from smartcn and kuromoji are analyzed fine, and testRandomStrings
is happy :)
                
> support unicode 6.2
> -------------------
>
>                 Key: LUCENE-4381
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4381
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4381.patch
>
>
> ICU will release a new version in about a month.
> They have a version for testing (http://site.icu-project.org/download/milestone) already
out with some interesting features, e.g. dictionary-based CJK segmentation.
> This issue is just to test it out/integrate the new stuff/etc. We should try out the
automation Steve did as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message