lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kaser <christoph.ka...@iconparc.de>
Subject Re: Numeric field min max values
Date Tue, 08 Nov 2011 08:12:29 GMT
Hi Chris,

Here is some code we use to obtain the int values from the TermEnum:

         HashSet<Integer> ints = new HashSet<Integer>();
         TermEnum te = reader.terms(new Term(fieldName,""));
         do {
             String val = te.term().text();

             //See the FieldCache-Implementation: NumericFields add some 
values that are only needed for range querying
             final int shift = val.charAt(0)-NumericUtils.SHIFT_START_INT;
             if (shift>0 && shift<=31)
                 break;

             ints.add(NumericUtils.prefixCodedToInt(val));
         }while(te.next());

Hope that helps,

Christoph Kaser

Am 07.11.2011 21:07, schrieb Uwe Schindler:
> This is caused by lower-precision terms used by NumericField to allow fast NumericRangeQuery.
You have to filter those values by looking at the first few bits, which contains the precision.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>> Sent: Monday, November 07, 2011 8:17 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Numeric field min max values
>>
>> hm - I recognized that when I iterate with TermEnum and decode the value
>> with prefixCodedToInt (..), I get correct values, but I also get values that are
not
>> Field values of this field in the entire index.
>> E.g. I get in the number-encoded field with the timestams also a '0'
>> as term - but all documents have a correct timestamp.
>> I also recognized that Luke shows the same values, even in the case the correct
>> decoder is selected. Luke also gives the opportunity to 'browse term docs', and
>> says that every document is a '0' - term document.
>>
>> Has anyone a idea?
>>
>> best
>>
>> Chris
>>
>> 2011/11/3 Christian Reuschling<christian.reuschling@gmail.com>:
>>> Thank you very much! This exactly solves my problem
>>>
>>>
>>> 2011/11/3 Ian Lea<ian.lea@gmail.com>:
>>>> I can't answer most of the questions, but oal.util.NumericUtils has
>>>> prefixCodedToInt (Long, etc) methods that will convert the encoded
>>>> value (what you are seeing, I presume) to int or long or whatever.
>>>> Maybe that will help.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>>>> <christian.reuschling@gmail.com>  wrote:
>>>>> Hi,
>>>>>
>>>>> maybe it is an easy question - I searched over the lucene-user
>>>>> archive, but sadly didn't found an answer :(
>>>>>
>>>>> I currently change our field logic from string- to numeric fields.
>>>>> Until now, I managed to find the min-max values of a field by
>>>>> iterating over the field with a TermEnum (termEnum =
>>>>> reader.terms(new Term(strFieldName, ""));).
>>>>>
>>>>> Now, in the case of a numeric field, I get some strange field values
>>>>> as "$)A M`" - I guess this could be a low-precision token from the
>>>>> field trie?
>>>>>
>>>>> Is there a special way to iterate over numeric field values? Or is
>>>>> there a possibility to get the trie and ask him for the min-max
>>>>> values? Or another (util)-class?
>>>>>
>>>>> Thanks for all answers!
>>>>>
>>>>> Chris
>>>>>
>>>>> --------------------------------------------------------------------
>>>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstrasse 1
80333 München

www.iconparc.de

Tel +49 -89- 15 90 06 - 21
Fax +49 -89- 15 90 06 - 49

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
121830, Amtsgericht München




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message