lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wang <welshw...@gmail.com>
Subject Re: DocValues questions
Date Thu, 04 Apr 2013 22:58:42 GMT
Thanks! Good to know the codec uses variable length encoding mechanism here.

On Thu, Apr 4, 2013 at 3:36 PM, Adrien Grand <jpountz@gmail.com> wrote:

> On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang <welshwang@gmail.com> wrote:
> > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte,
> > short, int, or long, they are all stored as NumericDocValuesField. Does
> > this mean "long" values are always stored regardless of the initial type?
> > If so, do we still save space if the value range is small? Do we need to
> > give some hint to NumericDocValuesField to save space?
>
> Space savings are codec-dependent, but the default codecs use bit
> packing to save space. For example:
>  - if all your values are between 0 and 255, Lucene will only use 8
> bits per value on average,
>  - if your documents only have three distinct values 1, 100 and 10000,
> Lucene will detect that this is a low-cardinality field and only use 2
> bits per value on average.
>
> This makes doc values storage-efficient, and much more
> memory-efficient than FieldCache, that people had to use unti Lucene
> 4.0.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message