lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4799) Enable extraction of originating term for ICU collation keys
Date Tue, 26 Feb 2013 12:24:13 GMT


Toke Eskildsen commented on LUCENE-4799:

Okay, I see how the order can be affected: If we have two terms that resolve to the same key,
the extended version will result in two separate ByteRefs, while the plain version will result
in only one. This is a problem if the field is used for sorting of documents and if there
is a secondary sort criteria.
> Enable extraction of originating term for ICU collation keys
> ------------------------------------------------------------
>                 Key: LUCENE-4799
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 4.1
>            Reporter: Toke Eskildsen
>            Priority: Minor
>              Labels: collator, facet
>         Attachments: LUCENE-4799.patch
> By concatenating generated ICU collation keys bytes with the originating term, it is
possible to extract the originating term at a later time. This makes it possible to build
a collator sorted facet field and similar multi-value/document structures.
> ICU collation keys are guaranteed to be terminated by a 0 (
and since comparison of keys stop when a 0 is encountered, the addition of the originating
term does not affect sort order. As 0 are _only_ used for termination in the key bytes, the
extraction of the originating term is unambiguous.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message