lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2514) Change Term to use bytes
Date Thu, 24 Jun 2010 20:45:49 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882321#action_12882321
] 

Uwe Schindler commented on LUCENE-2514:
---------------------------------------

bq. For example, currently the priority queue in TopTerms does BytesRef -> String conversion
and creates a new Term for each add, but this might be entirely useless as it could fall off
the pq, so i think its ScoreTerm or whatever should not hold term at all but just bytesref

Exactly! We removed support for TermEnum (without s), so field name is never null. You can
always take the field from the MTQ when building TermQueries. And for that we create the Term
using new Term(field, BytesRef) or with the non-interning placeholder (see also below). This
makes MTQ much simplier, I started to do it...

By the way: we could remove all String interning for field names now? We don't compare fields
anymore?

> Change Term to use bytes
> ------------------------
>
>                 Key: LUCENE-2514
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2514
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Search
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-2514-surrogates-dance.patch, LUCENE-2514.patch, LUCENE-2514.patch,
LUCENE-2514.patch, LUCENE-2514.patch
>
>
> in LUCENE-2426, the sort order was changed to codepoint order.
> unfortunately, Term is still using string internally, and more importantly its compareTo()
uses the wrong order [utf-16].
> So MultiTermQuery, etc (especially its priority queues) are currently wrong.
> By changing Term to use bytes, we can also support terms encoded as bytes such as numerics,
instead of using
> strange string encodings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message