lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: DiskDocValues vs Lucene42Codec
Date Fri, 08 Mar 2013 19:53:50 GMT
The underlying data formats are different. For example, because
Lucene42Codec will load terms into RAM, it uses an FST. But DiskDV
uses a more simplistic storage for the terms thats more suitable for
being disk-resident.

There are also different compression block sizes and so on in use.

you can pick and choose the formats on a per-field basis just as you
mentioned. In solr its also hooked into schema.xml so you can do
docValuesFormat="Disk" as an element on the field type (similar to

On Fri, Mar 8, 2013 at 2:02 PM, David Smiley (
<> wrote:
> DiskDocValues is a codec (or part of a codec, apparenlty) for accessing the
> DocValues from disk, with minimal RAM usage for things like offsets.
> Lucene42Codec alternatively puts all of DocValues in RAM.  Is the actual
> disk resident data format the same between them?  And how do you pick &
> choose the formats?  i.e. can I use Lucene42Codec for all the non-DocValues
> stuff but then use DiskDocValues so that I can let the OS's cache govern
> access to DV data while lowering my Java heap and giving the GC a break.  Ok
> I'm going to answer the 2nd question as I just discovered
> Lucene42Codec.getDocValuesFormatForField which I can customize.  But that
> still leaves the 1st question.  It would be nice to not have to re-index.
> ~ David
> -----
>  Author:
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message