lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
Date Mon, 29 Mar 2010 17:25:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851010#action_12851010
] 

Uwe Schindler edited comment on LUCENE-2354 at 3/29/10 5:23 PM:
----------------------------------------------------------------

bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same as trunk).

Yes. And i think we should keep it for now using 7 bit. Problems start when the sort order
of terms is needed (which is the case for NRQ). As default in flex is the UTF-8 term comparator,
it would not sort correctly for numeric fields with full 8 bits?

By the way, the recently added backwards test checks that an old index with NumericField behaves
as before! This is why I added a new zip file to TestBackwardCompatibility.

bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search perf would improve
but only a tiny bit since NRQ visits so few terms?

I dont think you will notice a difference. A standard int range contains maybe 10 to 20 sub-ranges
(at maximum), so converting between string and TermRef should not count. But the new implementation
is more clean. In principle we could remove the whole char[]/String based API in NumericUtils
- I only have to rewrite the tests and remove the NumericUtils test in backwards (as no longer
applies then, too).

      was (Author: thetaphi):
    bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same as trunk).

Yes. And i think we should keep it for now using 7 bit. Problems start when the sort order
of terms is needed (which is the case for NRQ). As default in flex is the UTF-8 term comparator,
it would not sort correctly for numeric fields with full 8 bits?

bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search perf would improve
but only a tiny bit since NRQ visits so few terms?

I dont think you will notice a difference. A standard int range contains maybe 10 to 20 sub-ranges
(at maximum), so converting between string and TermRef should not count. But the new implementation
is more clean. In principle we could remove the whole char[]/String based API in NumericUtils
- I only have to rewrite the tests and remove the NumericUtils test in backwards (as no longer
applies then, too).
  
> Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2354
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2354
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2354.patch
>
>
> After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream.
This also should convert the whole NumericUtils to use BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message