lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Lux <>
Subject Re: Stored fields: decompression slows down in my scenario ... any idea for a workaround?
Date Mon, 24 Jun 2013 16:13:43 GMT

Thanks again for all the help. Seems like the field compression allows
a huge step forward for my case. Here's some benchmarking for those of
you interested:

a document is
 * a StringField giving the actual image path
 * a single 64 byte feature (global OpponentHistogram)

number of documents in the index: 49,904

Search is based on linear loading of each stored feature field and
assessing distance to a query image.

If I am using a custom codec:
Indexing is much faster, i.e. down to 9ms instead of 22ms per image.
With CompressionMode.FAST & 4k chunk size: search ~209ms
With CompressionMode.FAST_DECOMPRESSION & 4k chunk size: search ~175ms
With CompressionMode.FAST_DECOMPRESSION & 1k chunk size: search ~95ms
With CompressionMode.FAST_DECOMPRESSION & 512 b chunk size: search ~83ms
With CompressionMode.FAST_DECOMPRESSION & 256 b chunk size: search ~77ms

Original StoredField compression: search ~660ms

When searching for an image within memory I came down to 44ms.
Therefore, 77ms is totally acceptable in these terms. My benchmarking
of the BinaryDocValuesField showed that it'd come close to the 44ms,
but I didn't go for a full evaluation as a lot of re-coding was


On Mon, Jun 24, 2013 at 3:13 PM, Adrien Grand <> wrote:
> Hi,
> On Mon, Jun 24, 2013 at 2:47 PM, Mathias Lux <> wrote:
>> Still, I've read that all the BinaryDocValues go directly to memory.
>> Am I right with this?
> It is true that the current default implementation stores them in
> memory. However, disk doc values formats can be configured on a
> per-field basis, so you could just write:
> Codec codec = new Lucene42Codec() {
>   final DiskDocValuesFormat diskDVF = new DiskDocValuesFormat();
>   @Override
>   public DocValuesFormat getDocValuesFormatForField(String fieldName) {
>     return diskDVF;
>   }
> }
> to store them on disk instead (add conditions on fieldName if you want
> to have different behaviors based on the field name).
>> I've also tried to change the codec, but I'm stuck with the
>> IndexReader. It throws
> This is because you defined a new custom codec (with a unique name to
> identity it) without registering it in
> META-INF/org.apache.lucene.codecs.Codec in your classpath. Note that
> the example above doesn't require you to register a different codec
> since it is fully compatible with Lucene42Codec and uses the same
> name.
>> Also I understand that the APIs are still experimental and in no way
>> stable. As I'm quite a lazy programmer I'd like to hear you opinion on
>> how stable the APIs for BinaryDocValues and Codec might be? :)
> I can't predict the future :), but given the time and energy that has
> been put into the doc values APIs for the 4.2 release (thanks
> Robert!), I'd say that they shouldn't change much in the next months.
> --
> Adrien
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message