jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: [MongoMK] Reading blobs incrementally
Date Wed, 17 Oct 2012 09:20:43 GMT
On Wed, Oct 17, 2012 at 11:08 AM, Michael Dürig <mduerig@apache.org> wrote:
>
>
> On 17.10.12 10:03, Stefan Guggisberg wrote:
>>
>> On Wed, Oct 17, 2012 at 10:42 AM, Michael Dürig <mduerig@apache.org>
>> wrote:
>>>
>>>
>>> I wonder why the Microkernel API has an asymmetry here: for writing a
>>> binary
>>> you can pass a stream where as for reading you need to pass a byte array.
>>
>>
>> the write method implies a content-addressable storage for blobs,
>> i.e. identical binary content is identified by identical identifiers.
>> the identifier
>> needs to be computed from the entire blob content. that's why the
>> signature takes
>> a stream rather than supporting chunked writes.
>
>
> Makes sense so far but this is only half of the story ;-) Why couldn't the
> read method also return a stream?

it could, but then why should it? for cosmetical reasons? personally
i prefer the current signature for cleaner semantics and ease of implementation.

cheers
stefan

>
> Michael
>
>
>>
>> cheers
>> stefan
>>
>>>
>>> Michael
>>>
>>>
>>> On 26.9.12 8:38, Mete Atamel wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I realized that MicroKernelIT#testBlobs takes a while to complete on
>>>> MongoMK. This is partly due to how the test was written and partly due
>>>> to
>>>> how the blob read offset is implemented in MongoMK. I'm looking for
>>>> feedback on where to fix this.
>>>>
>>>> To give you an idea on testBlobs, it first writes a blob using MK. Then,
>>>> it verifies that the blob bytes were written correctly by reading the
>>>> blob
>>>> from MK. However, blob read from MK is not done in one shot. Instead,
>>>> it's
>>>> done via this input stream:
>>>>
>>>> InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk,
>>>> id));
>>>>
>>>>
>>>> MicroKernelInputStream reads from the MK and BufferedInputStream buffers
>>>> the reads in 8K chunks. Then, there's a while loop with in2.read() to
>>>> read
>>>> the blob fully. This makes a call to MicroKernel#read method with the
>>>> right offset for every 8K chunk until the blob bytes are fully read.
>>>>
>>>> This is not a problem for small blob sizes but for bigger blob sizes,
>>>> reading 8K chunks can be slow because in MongoMK, every read with offset
>>>> triggers the following:
>>>> -Find the blob from GridFS
>>>> -Retrieve its input stream
>>>> -Skip to the right offset
>>>> -Read 8K
>>>> -Close the input stream
>>>>
>>>> I could fix this by changing the test to read the blob bytes in one shot
>>>> and then do the comparison. However, I was wondering if we should also
>>>> work on an optimization for successive reads from the blob with
>>>> incremental offsets? Maybe we could keep the input stream of recently
>>>> read
>>>> blobs around for some time before closing them?
>>>>
>>>> Best,
>>>> Mete
>>>>
>>>>
>>>
>

Mime
View raw message