jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mete Atamel <mata...@adobe.com>
Subject [MongoMK] Reading blobs incrementally
Date Wed, 26 Sep 2012 07:38:19 GMT

I realized that MicroKernelIT#testBlobs takes a while to complete on
MongoMK. This is partly due to how the test was written and partly due to
how the blob read offset is implemented in MongoMK. I'm looking for
feedback on where to fix this.

To give you an idea on testBlobs, it first writes a blob using MK. Then,
it verifies that the blob bytes were written correctly by reading the blob
from MK. However, blob read from MK is not done in one shot. Instead, it's
done via this input stream:

InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk,

MicroKernelInputStream reads from the MK and BufferedInputStream buffers
the reads in 8K chunks. Then, there's a while loop with in2.read() to read
the blob fully. This makes a call to MicroKernel#read method with the
right offset for every 8K chunk until the blob bytes are fully read.

This is not a problem for small blob sizes but for bigger blob sizes,
reading 8K chunks can be slow because in MongoMK, every read with offset
triggers the following:
-Find the blob from GridFS
-Retrieve its input stream
-Skip to the right offset
-Read 8K 
-Close the input stream

I could fix this by changing the test to read the blob bytes in one shot
and then do the comparison. However, I was wondering if we should also
work on an optimization for successive reads from the blob with
incremental offsets? Maybe we could keep the input stream of recently read
blobs around for some time before closing them?


View raw message