jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: [MongoMK] Reading blobs incrementally
Date Wed, 17 Oct 2012 08:42:32 GMT

I wonder why the Microkernel API has an asymmetry here: for writing a 
binary you can pass a stream where as for reading you need to pass a 
byte array.


On 26.9.12 8:38, Mete Atamel wrote:
> Hi,
> I realized that MicroKernelIT#testBlobs takes a while to complete on
> MongoMK. This is partly due to how the test was written and partly due to
> how the blob read offset is implemented in MongoMK. I'm looking for
> feedback on where to fix this.
> To give you an idea on testBlobs, it first writes a blob using MK. Then,
> it verifies that the blob bytes were written correctly by reading the blob
> from MK. However, blob read from MK is not done in one shot. Instead, it's
> done via this input stream:
> InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk,
> id));
> MicroKernelInputStream reads from the MK and BufferedInputStream buffers
> the reads in 8K chunks. Then, there's a while loop with in2.read() to read
> the blob fully. This makes a call to MicroKernel#read method with the
> right offset for every 8K chunk until the blob bytes are fully read.
> This is not a problem for small blob sizes but for bigger blob sizes,
> reading 8K chunks can be slow because in MongoMK, every read with offset
> triggers the following:
> -Find the blob from GridFS
> -Retrieve its input stream
> -Skip to the right offset
> -Read 8K
> -Close the input stream
> I could fix this by changing the test to read the blob bytes in one shot
> and then do the comparison. However, I was wondering if we should also
> work on an optimization for successive reads from the blob with
> incremental offsets? Maybe we could keep the input stream of recently read
> blobs around for some time before closing them?
> Best,
> Mete

View raw message