lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
Date Tue, 11 Nov 2008 10:58:44 GMT


Michael McCandless commented on LUCENE-1435:

Could we, alternatively, push this change into DocumentsWriter, such that on writing a segment
it uses a per-field Collator (FieldInfo would be extended to record this) to sort the terms

I haven't fully thought through the tradeoffs... but it seems like this'd be simpler to use?
 Ie rather than putting a CollationKeyFilter in your analyzer chain, and then doing the reverse
of this for all searches at search time, you simply set the Collator on the fields (at indexing
& searching time, since I agree we should for now not try to serialize into the index
which field has which Collator)?

I guess there is a performance cost to using the Collator to do live binary search (during
searching) and sorting (during indexing) vs doing unicode String comparisions but in practice
at search time this is probably a tiny part of the net cost of searching?

> CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>                 Key: LUCENE-1435
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1435.patch, LUCENE-1435.patch
> Converts each token into its CollationKey using the provided collator, and then encodes
the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need collation
for proper ordering.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message