lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: DocValues questions
Date Thu, 04 Apr 2013 22:36:24 GMT
On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang <welshwang@gmail.com> wrote:
> Given the new Lucene 4.2 DocValues API, it seems no matter it is byte,
> short, int, or long, they are all stored as NumericDocValuesField. Does
> this mean "long" values are always stored regardless of the initial type?
> If so, do we still save space if the value range is small? Do we need to
> give some hint to NumericDocValuesField to save space?

Space savings are codec-dependent, but the default codecs use bit
packing to save space. For example:
 - if all your values are between 0 and 255, Lucene will only use 8
bits per value on average,
 - if your documents only have three distinct values 1, 100 and 10000,
Lucene will detect that this is a low-cardinality field and only use 2
bits per value on average.

This makes doc values storage-efficient, and much more
memory-efficient than FieldCache, that people had to use unti Lucene
4.0.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message