lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1571) unicode collation support
Date Sat, 21 Nov 2009 21:14:39 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781053#action_12781053
] 

Robert Muir edited comment on SOLR-1571 at 11/21/09 9:13 PM:
-------------------------------------------------------------

Hi, i wonder if anyone has any comments on this.

I know this is an invisible/covert JIRA issue right now :)

especially I am curious if the approach is sound, particularly regarding using the ICUCollationFilter
instead.
In my opinion, this should be a separate integration, even though it will index at a significantly
faster speed with much smaller keys.
The reason is that it is not compat with the JDK collation keys, and has different properties,
such as the fact Collator is thread-safe in the JDK, but not thread-safe in ICU.
Because of this, I decided to stick with the JDK impl initially.


      was (Author: rcmuir):
    Hi, i wonder if anyone has any comments on this.

I know this is an invisible/convert JIRA issue right now :)

especially I am curious if the approach is sound, particularly regarding using the ICUCollationFilter
instead.
In my opinion, this should be a separate integration, even though it will index at a significantly
faster speed with much smaller keys.
The reason is that it is not compat with the JDK collation keys, and has different properties,
such as the fact Collator is thread-safe in the JDK, but not thread-safe in ICU.
Because of this, I decided to stick with the JDK impl initially.

  
> unicode collation support
> -------------------------
>
>                 Key: SOLR-1571
>                 URL: https://issues.apache.org/jira/browse/SOLR-1571
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-1571.patch
>
>
> This patch adds support for unicode collation (searching and sorting).
> Unicode collation is helpful in a search engine, for many languages you want things to
match or sort differently.
> You might even want to use copyfield and support different sort orders/matching schemes
if you need to support multiple languages.
> This is simply a factory for lucene's CollationKeyFilter, which indexes binary collation
keys in a special format that preserves binary sort order.
> I've added support for creating a Collator in two ways:
> * system collator from a Locale spec (language + country + variant)
> * tailored collator from custom rules in a text file
> in no way is there an option to use the "default" locale of the jvm, (I consider this
a bit dangerous)
> in this patch, it is mandatory to define the locale explicitly for a system collator.
> The required lucene-collation-2.9.1.jar is only 12KB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message