lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <ivasi...@sirma.bg>
Subject Re: Treating values of numeric fields as numbers
Date Fri, 14 Sep 2007 12:26:29 GMT
Hi Eric,

Yes they help for particularly this problem - they shrik all avilable 
Longs to 14 character strings. So they free me from the limitation that 
I wrote about (1970 - 2280 years).
So now the advantage that I have with treating fields representing UTC 
dates as numbers (but not as strings) seems to be some times smaller. If 
storing UTCs in radix 36 and considering the real situation in my case - 
I have only 2 UTC-fields with which keep dates of theese years I will 
save approximate 20MB for 1 million docs which is 0,5 - 2 % of the index 
size.

But may be this approach that I describe in my first mail (and tested it 
successfully by now) will help to someone who uses float values in the 
index or longs that vary reasonably in length.

Best Regards,
Ivan

Erik Hatcher wrote:
> Ivan - have you considered using NumberUtils?
>
>   
> <http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/document/NumberTools.html>

>
>
>
> I'm curious if those utility methods solve the same problem you're 
> working on.
>
>     Erik
>
>
> On Sep 13, 2007, at 1:19 PM, Ivan Vasilev wrote:
>
>> Hi All,
>>
>> I have made some changes in my Lucene source, so that values of 
>> numeric fields to be treated as numbers but not as Strings. After 
>> testing everything seems to work correctly, but I still would like to 
>> know your opinion about this.
>> So my approach is the following:
>>
>> 1. As during the indexing process the terms are ordered according 
>> their values I made changes in the methods 
>> TermBuffer.compareTo(TermBuffer other) and Term.compareTo(Term other) 
>> so that when the filed contains numeric data comparison to be made 
>> based on numeric logic (Integer.compareto(..), Float.compareTo(..)). 
>> So this orders terms based on numeric logic but not based on 
>> lexicographical one.
>>
>> 2. To work correctly range searches similar changes were made in 
>> RangeFilter.bits(IndexReader reader) and 
>> RangeQuery.rewrite(IndexReader reader) methods.
>>
>> Changes seem to be very simple, but I did not found case when they 
>> lead to wrong behavior.
>> Before these changes to make range queries on numeric fields I made 
>> values of those fields with fixed length so that the lexicographical 
>> order to be the same like the numeric one. So I had to keep dates in 
>> some fields and I made them 13 length fields that keep UTC 
>> representation of the date. When the UTC was short I prefixed it with 
>> zeroes. This made range searches to work correctly for the time 
>> interval 1970 - 2280 approximately. Now with the new implementation I 
>> do not have any restrictions about this time interval.
>>
>> So before some time I digged in Lucene forum and saw there were some 
>> discussions about this. So if anybody uses also such approach, or 
>> have bad experience with it, please tell me.
>> Tanks in advance.
>>
>>
>> Best Regards,
>> Ivan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> __________ NOD32 2528 (20070913) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message