lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Reuschling <christian.reuschl...@gmail.com>
Subject Re: Numeric field min max values
Date Tue, 08 Nov 2011 11:58:35 GMT
Thank you very much - this really helps me a lot!


2011/11/8 Christoph Kaser <christoph.kaser@iconparc.de>:
> Hi Chris,
>
> Here is some code we use to obtain the int values from the TermEnum:
>
>        HashSet<Integer> ints = new HashSet<Integer>();
>        TermEnum te = reader.terms(new Term(fieldName,""));
>        do {
>            String val = te.term().text();
>
>            //See the FieldCache-Implementation: NumericFields add some
> values that are only needed for range querying
>            final int shift = val.charAt(0)-NumericUtils.SHIFT_START_INT;
>            if (shift>0 && shift<=31)
>                break;
>
>            ints.add(NumericUtils.prefixCodedToInt(val));
>        }while(te.next());
>
> Hope that helps,
>
> Christoph Kaser
>
> Am 07.11.2011 21:07, schrieb Uwe Schindler:
>>
>> This is caused by lower-precision terms used by NumericField to allow fast
>> NumericRangeQuery. You have to filter those values by looking at the first
>> few bits, which contains the precision.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>>> Sent: Monday, November 07, 2011 8:17 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Numeric field min max values
>>>
>>> hm - I recognized that when I iterate with TermEnum and decode the value
>>> with prefixCodedToInt (..), I get correct values, but I also get values
>>> that are not
>>> Field values of this field in the entire index.
>>> E.g. I get in the number-encoded field with the timestams also a '0'
>>> as term - but all documents have a correct timestamp.
>>> I also recognized that Luke shows the same values, even in the case the
>>> correct
>>> decoder is selected. Luke also gives the opportunity to 'browse term
>>> docs', and
>>> says that every document is a '0' - term document.
>>>
>>> Has anyone a idea?
>>>
>>> best
>>>
>>> Chris
>>>
>>> 2011/11/3 Christian Reuschling<christian.reuschling@gmail.com>:
>>>>
>>>> Thank you very much! This exactly solves my problem
>>>>
>>>>
>>>> 2011/11/3 Ian Lea<ian.lea@gmail.com>:
>>>>>
>>>>> I can't answer most of the questions, but oal.util.NumericUtils has
>>>>> prefixCodedToInt (Long, etc) methods that will convert the encoded
>>>>> value (what you are seeing, I presume) to int or long or whatever.
>>>>> Maybe that will help.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>>>>> <christian.reuschling@gmail.com>  wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> maybe it is an easy question - I searched over the lucene-user
>>>>>> archive, but sadly didn't found an answer :(
>>>>>>
>>>>>> I currently change our field logic from string- to numeric fields.
>>>>>> Until now, I managed to find the min-max values of a field by
>>>>>> iterating over the field with a TermEnum (termEnum =
>>>>>> reader.terms(new Term(strFieldName, ""));).
>>>>>>
>>>>>> Now, in the case of a numeric field, I get some strange field values
>>>>>> as "$)A M`" - I guess this could be a low-precision token from the
>>>>>> field trie?
>>>>>>
>>>>>> Is there a special way to iterate over numeric field values? Or is
>>>>>> there a possibility to get the trie and ask him for the min-max
>>>>>> values? Or another (util)-class?
>>>>>>
>>>>>> Thanks for all answers!
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> --------------------------------------------------------------------
>>>>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Dipl.-Inf. Christoph Kaser
>
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
>
> www.iconparc.de
>
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
>
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message