lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Possible IndexInput optimization
Date Sun, 29 Mar 2009 07:51:09 GMT

I did not experiment lately, but I'd like to add a general compressed
integer array to the basic types in an index, that would be compressed
on writing and decompressed on reading.

A first attempt is at LUCENE-1410, and one of the choices I had there
was whether or not to use NIO buffer methods on the index side.
I started there using these NIO buffer methods, but it seems that
the explicit byte arrays you're using here could be a good alternative.

I think my question boils down to whether or not these NIO buffers will
(in the end) get in the way of similar low level optimizations
you'd like to see applied here.

Paul Elschot

On Sunday 29 March 2009 00:43:28 Earwin Burrfoot wrote:
> While drooling over MappedBigByteBuffer, which we'll (hopefully) see
> in JDK7, I revisited my own Directory code and noticed a certain
> peculiarity, shared by Lucene core classes:
> Each and every IndexInput implementation only implements readByte()
> and readBytes(), never trying to override readInt/VInt/Long/etc
> methods.
> Currently RAMDirectory uses a list of byte arrays as a backing store,
> and I got some speedup when switched to custom version that knows each
> file size beforehand and thus is able to allocate a single byte array
> (deliberately accepting 2Gb file size limitation) of exactly needed
> length. Nothing strange here, readByte(s) methods are easily most oft
> called ones in a Lucene app and they were greatly simplified -
> readByte became mere:
> public byte readByte() throws IOException {
>     return buffer[position++]; // I dropped bounds checking, relying
> on natural ArrayIndexOOBE, we can't easily catch and recover from it
> anyway
> }
> But now, readInt is four readByte calls, readLong is two readInts (ten
> calls in total), readString - god knows how many. Unless you use a
> single type of Directory through the lifetime of your application,
> these readByte calls are never inlined, JIT invokevirtual
> short-circuit optimization (it skips method lookup if it always finds
> the same one during this exact invocation) cannot be applied too.
> There are three cases when we can override readNNN methods and provide
> implementations with zero or minimum method invocations -
> RAMDirectory, MMapDirectory and BufferedIndexInput for
> FSDirectory/CompoundFileReader. Anybody tried this?
> -- 
> Kirill Zakharenko/Кирилл Захаренко (
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message