lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: DocValues space usage
Date Tue, 09 Apr 2013 16:25:28 GMT
On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand <jpountz@gmail.com> wrote:

> The default codec stores numeric doc values by blocks of 4096 values
> that have independent numbers of bits per values. If you end up having
> most of these blocks empty, doc values will require little space but
> in a worst-case scenario where each block contains 1 single value, it
> is true that memory and disk usage will be very inefficient.
>

Also the default codec has a performance hack (depending on
acceptableOverHead) for optimizing the single byte case (e.g. norms or
other smallfloat scoring factor). In this case it doesn't even use
blockpackedwriter at all.

Thats why I recommended diskdv codec instead... the concepts are the same
but its not yet "optimized" so its easier to understand whats going on :)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message