lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2514) Change Term to use bytes
Date Thu, 24 Jun 2010 21:45:52 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2514:
--------------------------------

    Attachment: LUCENE-2514.patch

attached is an updated patch, with uwe's changes, plus some additional conversions like TermsFilter
and FieldCacheTermsFilter

The range ones are a bit tricky, mainly because they work with collators with makes no sense
with byte[]. but if collator is null then byte[] makes sense.

the collator stuff is silly in a way, if we switch collation to byte[] it will use less ram
than even the original String in lucene 3.x, and sort much faster.

one option might be to split the collating range stuff into its own classes or something,
i think its a bit confusing how collation is mixed in with 'binary' order... it tricks you
into thinking the 'default' is UCA or default locale or something, but is neither. 

> Change Term to use bytes
> ------------------------
>
>                 Key: LUCENE-2514
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2514
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Search
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-2514-surrogates-dance.patch, LUCENE-2514.patch, LUCENE-2514.patch,
LUCENE-2514.patch, LUCENE-2514.patch, LUCENE-2514.patch, LUCENE-2514.patch
>
>
> in LUCENE-2426, the sort order was changed to codepoint order.
> unfortunately, Term is still using string internally, and more importantly its compareTo()
uses the wrong order [utf-16].
> So MultiTermQuery, etc (especially its priority queues) are currently wrong.
> By changing Term to use bytes, we can also support terms encoded as bytes such as numerics,
instead of using
> strange string encodings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message