lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-1571) unicode collation support
Date Wed, 18 Nov 2009 06:00:42 GMT
unicode collation support
-------------------------

                 Key: SOLR-1571
                 URL: https://issues.apache.org/jira/browse/SOLR-1571
             Project: Solr
          Issue Type: New Feature
          Components: Analysis
            Reporter: Robert Muir
            Priority: Minor
         Attachments: SOLR-1571.patch

This patch adds support for unicode collation (searching and sorting).
Unicode collation is helpful in a search engine, for many languages you want things to match
or sort differently.
You might even want to use copyfield and support different sort orders/matching schemes if
you need to support multiple languages.

This is simply a factory for lucene's CollationKeyFilter, which indexes binary collation keys
in a special format that preserves binary sort order.

I've added support for creating a Collator in two ways:
* system collator from a Locale spec (language + country + variant)
* tailored collator from custom rules in a text file

in no way is there an option to use the "default" locale of the jvm, (I consider this a bit
dangerous)
in this patch, it is mandatory to define the locale explicitly for a system collator.

The required lucene-collation-2.9.1.jar is only 12KB.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message