lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4799) Enable extraction of originating term for ICU collation keys
Date Tue, 26 Feb 2013 12:12:13 GMT


Toke Eskildsen commented on LUCENE-4799:

I do not understand what you mean by separate field, Robert? The current patch does not touch
Solr's ICUCollationField, but do you mean that Solr support should be added with a new field,
such as ICUCollationExtendedField?

Your primary concern, as I understand it, is that there is currently no clean way to perform
analysis prior to collation key generation. Without a normalization step, we often end up
with multiple keys that should have been the same, such as "CD", "Cd" and "cd".

The patch is a first shot of originating term support and does not attempt to solve the missing
pre-analysis for collation fields, which I find is a wholly separate issue.
> Enable extraction of originating term for ICU collation keys
> ------------------------------------------------------------
>                 Key: LUCENE-4799
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 4.1
>            Reporter: Toke Eskildsen
>            Priority: Minor
>              Labels: collator, facet
>         Attachments: LUCENE-4799.patch
> By concatenating generated ICU collation keys bytes with the originating term, it is
possible to extract the originating term at a later time. This makes it possible to build
a collator sorted facet field and similar multi-value/document structures.
> ICU collation keys are guaranteed to be terminated by a 0 (
and since comparison of keys stop when a 0 is encountered, the addition of the originating
term does not affect sort order. As 0 are _only_ used for termination in the key bytes, the
extraction of the originating term is unambiguous.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message