lucy-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Wellnhofer (Updated) (JIRA)" <j...@apache.org>
Subject [lucy-issues] [jira] [Updated] (LUCY-191) Unicode normalization
Date Sat, 19 Nov 2011 17:08:51 GMT

     [ https://issues.apache.org/jira/browse/LUCY-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Wellnhofer updated LUCY-191:
---------------------------------

    Attachment: LUCY-191-normalizer.patch

Initial implementation of Lucy::Analysis::Normalizer
                
> Unicode normalization
> ---------------------
>
>                 Key: LUCY-191
>                 URL: https://issues.apache.org/jira/browse/LUCY-191
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Nick Wellnhofer
>            Priority: Minor
>              Labels: patch
>         Attachments: LUCY-191-normalizer.patch
>
>
> As discussed on the mailing list, it would be nice to have Unicode normalization, Unicode
case folding and stripping of accents as part of the analyzer chain. With the help of utf8proc
this can be done in one pass. So I proposed a new analyzer Lucy::Analyzer::Normalizer with
an interface described here:
> http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201111.mbox/%3C4EC43816.1070107%40aevum.de%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message