lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Possible IndexInput optimization
Date Sat, 28 Mar 2009 23:43:28 GMT
While drooling over MappedBigByteBuffer, which we'll (hopefully) see
in JDK7, I revisited my own Directory code and noticed a certain
peculiarity, shared by Lucene core classes:
Each and every IndexInput implementation only implements readByte()
and readBytes(), never trying to override readInt/VInt/Long/etc

Currently RAMDirectory uses a list of byte arrays as a backing store,
and I got some speedup when switched to custom version that knows each
file size beforehand and thus is able to allocate a single byte array
(deliberately accepting 2Gb file size limitation) of exactly needed
length. Nothing strange here, readByte(s) methods are easily most oft
called ones in a Lucene app and they were greatly simplified -
readByte became mere:
public byte readByte() throws IOException {
    return buffer[position++]; // I dropped bounds checking, relying
on natural ArrayIndexOOBE, we can't easily catch and recover from it

But now, readInt is four readByte calls, readLong is two readInts (ten
calls in total), readString - god knows how many. Unless you use a
single type of Directory through the lifetime of your application,
these readByte calls are never inlined, JIT invokevirtual
short-circuit optimization (it skips method lookup if it always finds
the same one during this exact invocation) cannot be applied too.

There are three cases when we can override readNNN methods and provide
implementations with zero or minimum method invocations -
RAMDirectory, MMapDirectory and BufferedIndexInput for
FSDirectory/CompoundFileReader. Anybody tried this?

Kirill Zakharenko/Кирилл Захаренко (
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message