lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wang <welshw...@gmail.com>
Subject Re: DocValues space usage
Date Tue, 09 Apr 2013 17:12:15 GMT
Adrien and Rober, thanks a lot for the hints. Will try a few options and
see how it goes.

On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir <rcmuir@gmail.com> wrote:

> On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > The default codec stores numeric doc values by blocks of 4096 values
> > that have independent numbers of bits per values. If you end up having
> > most of these blocks empty, doc values will require little space but
> > in a worst-case scenario where each block contains 1 single value, it
> > is true that memory and disk usage will be very inefficient.
> >
>
> Also the default codec has a performance hack (depending on
> acceptableOverHead) for optimizing the single byte case (e.g. norms or
> other smallfloat scoring factor). In this case it doesn't even use
> blockpackedwriter at all.
>
> Thats why I recommended diskdv codec instead... the concepts are the same
> but its not yet "optimized" so its easier to understand whats going on :)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message