lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wang <>
Subject DocValues space usage
Date Tue, 09 Apr 2013 15:22:57 GMT
DocValues makes fast per doc value lookup possible, which is nice. But it
brings other interesting issues.

Assume there are 100M docs and 200 NumericDocValuesFields, this ends up
with huge number of disk and memory usage, even if there are just thousands
of values for each field. I guess this is because Lucene stores a value for
each DocValues field of each document, with variable-length codec.

So in such scenario, is it possible only store values for the DocValues
field of the docment that actually has a value for that field? Or does
Lucene has a column storage mechanism sort of like hash map for DocValues:

key: the docId that has a value for the DocValues field
value: the value of the DocValues field

I am using Lucene 4.2.1.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message