lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
Date Tue, 11 Nov 2008 22:31:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646699#action_12646699
] 

Michael McCandless commented on LUCENE-1435:
--------------------------------------------

bq. IndexableBinaryStringTools (LUCENE-1434) implements a base-8000h encoding: the lower 15
bits of each character have 1-7/8 bytes packed into them. It's radically different from the
original byte array, at least in terms of looking at it with a text viewer like Luke. And
I don't think CollationKeys themselves are intended for human consumption.

Oh OK.  So having done this term conversion, you can't really look at / use the resulting
terms in the index for human consumption (you'd have to store stuff yourself).

bq. Perhaps I'm missing something, but o.a.l.index.TermEnum.skipTo(Term) compares the target
term using String.compareTo(),

But we could just fix that to pay attention to the Collator for that field, if it has one,
right?  (Or with flexible indexing I think the impl really should own this method, ie, it
should be abstract in TermEnum).

I think the external approach is fine for starters... I just think long-term it may make sense
to have core Lucene respect the Collator, but it really is an invasive change.  We should
wait until we make progress on flexible indexing at which point such a change should be far
less costly.

> CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1435
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1435
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1435.patch, LUCENE-1435.patch
>
>
> Converts each token into its CollationKey using the provided collator, and then encodes
the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need collation
for proper ordering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message