lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: NumberTools
Date Tue, 22 Mar 2005 08:25:23 GMT
: > I can see in FieldDocSortedHitQueue where the case statement deals with
: > the various types of SortField, but at that point it's comparing FieldDoc
: > objects whose fields[i] is expected to allready be an "Integer" object.
: > where is that "Integer" object parsed from the String value of the field?
: >
: Surely, by using the number -> string algorithm I showed earlier this
: would not be a problem.  Did I miss something?

I haven't worked through the math to prove to myself that your algorithm
is a viable way of expressing any Integer as a 4 byte String;  such that
any two Integers sort lexigraphically correct as strings ... but let's
assume that i have, and that it works perfectly.

So now any RangeFilter or RangeQuery (which operate on String term values)
will work ... what about sorting?

Well, the basis of your idea is custom code to format the Integer as a
string, mainly...

:   public static String convertTotText(int input)
:   {
:     int unsigned = input + Integer.MIN_VALUE;
:     char c2 = (char) (unsigned & 0x0000FFFF);
:     char c1 = (char) (unsigned >> 16 & 0x0000FFFF);
:     return new String(new char[] {c1, c2});
:   }

As Strings, the Lucene sorting code is not going to look at those and
recognize them as numbers, and even if you specified SortField.INT, the
default parser (wherever it is) isn't going to be able to make heads or
tails of them -- so it's going to have to Sort them as Strings, which is
slower then sorting them as Integers -- even if they are only 4 bytes long
(unless I'm wrong, which is entirely possible ... i haven't tested it).

Now, if we could override the method used to parse the Term values from
Strings to Integers (using a user specified NumberFormat as i proposed)
then we could name your "convertTotText" method as "format" and write a
corrisponding "parse" method, and everything would work smashing.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message