lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Lux <m...@itec.uni-klu.ac.at>
Subject Re: Stored fields: decompression slows down in my scenario ... any idea for a workaround?
Date Mon, 24 Jun 2013 16:13:43 GMT
Hi!

Thanks again for all the help. Seems like the field compression allows
a huge step forward for my case. Here's some benchmarking for those of
you interested:

a document is
 * a StringField giving the actual image path
 * a single 64 byte feature (global OpponentHistogram)

number of documents in the index: 49,904

Search is based on linear loading of each stored feature field and
assessing distance to a query image.

If I am using a custom codec:
Indexing is much faster, i.e. down to 9ms instead of 22ms per image.
With CompressionMode.FAST & 4k chunk size: search ~209ms
With CompressionMode.FAST_DECOMPRESSION & 4k chunk size: search ~175ms
With CompressionMode.FAST_DECOMPRESSION & 1k chunk size: search ~95ms
With CompressionMode.FAST_DECOMPRESSION & 512 b chunk size: search ~83ms
With CompressionMode.FAST_DECOMPRESSION & 256 b chunk size: search ~77ms

Original StoredField compression: search ~660ms

When searching for an image within memory I came down to 44ms.
Therefore, 77ms is totally acceptable in these terms. My benchmarking
of the BinaryDocValuesField showed that it'd come close to the 44ms,
but I didn't go for a full evaluation as a lot of re-coding was
needed.

cheers,
   Mathias

On Mon, Jun 24, 2013 at 3:13 PM, Adrien Grand <jpountz@gmail.com> wrote:
> Hi,
>
> On Mon, Jun 24, 2013 at 2:47 PM, Mathias Lux <mlux@itec.uni-klu.ac.at> wrote:
>> Still, I've read that all the BinaryDocValues go directly to memory.
>> Am I right with this?
>
> It is true that the current default implementation stores them in
> memory. However, disk doc values formats can be configured on a
> per-field basis, so you could just write:
>
> Codec codec = new Lucene42Codec() {
>
>   final DiskDocValuesFormat diskDVF = new DiskDocValuesFormat();
>
>   @Override
>   public DocValuesFormat getDocValuesFormatForField(String fieldName) {
>     return diskDVF;
>   }
>
> }
>
> to store them on disk instead (add conditions on fieldName if you want
> to have different behaviors based on the field name).
>
>> I've also tried to change the codec, but I'm stuck with the
>> IndexReader. It throws
>
> This is because you defined a new custom codec (with a unique name to
> identity it) without registering it in
> META-INF/org.apache.lucene.codecs.Codec in your classpath. Note that
> the example above doesn't require you to register a different codec
> since it is fully compatible with Lucene42Codec and uses the same
> name.
>
>> Also I understand that the APIs are still experimental and in no way
>> stable. As I'm quite a lazy programmer I'd like to hear you opinion on
>> how stable the APIs for BinaryDocValues and Codec might be? :)
>
> I can't predict the future :), but given the time and energy that has
> been put into the doc values APIs for the 4.2 release (thanks
> Robert!), I'd say that they shouldn't change much in the next months.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message