Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DD5BD4E5 for ; Wed, 17 Oct 2012 09:08:59 +0000 (UTC) Received: (qmail 17587 invoked by uid 500); 17 Oct 2012 09:08:58 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 17556 invoked by uid 500); 17 Oct 2012 09:08:58 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 17540 invoked by uid 99); 17 Oct 2012 09:08:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:08:58 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.18.1.23] (HELO exprod6og109.obsmtp.com) (64.18.1.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:08:49 +0000 Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob109.postini.com ([64.18.5.12]) with SMTP ID DSNKUH51jLceO2uSGGnlqPjQVBIANSwGrhcD@postini.com; Wed, 17 Oct 2012 02:08:28 PDT Received: from inner-relay-1.corp.adobe.com (inner-relay-1.adobe.com [153.32.1.51]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q9H98QKd024997 for ; Wed, 17 Oct 2012 02:08:27 -0700 (PDT) Received: from nacas03.corp.adobe.com (nacas03.corp.adobe.com [10.8.189.121]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q9H98PNc010101 for ; Wed, 17 Oct 2012 02:08:25 -0700 (PDT) Received: from eurhub01.eur.adobe.com (10.128.4.30) by nacas03.corp.adobe.com (10.8.189.121) with Microsoft SMTP Server (TLS) id 8.3.279.1; Wed, 17 Oct 2012 02:08:25 -0700 Received: from susi.local (10.136.134.240) by eurhub01.eur.adobe.com (10.128.4.111) with Microsoft SMTP Server id 8.3.279.1; Wed, 17 Oct 2012 10:08:23 +0100 Message-ID: <507E7586.4080609@apache.org> Date: Wed, 17 Oct 2012 10:08:22 +0100 From: =?ISO-8859-1?Q?Michael_D=FCrig?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: Subject: Re: [MongoMK] Reading blobs incrementally References: <507E6F78.5030602@apache.org> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org On 17.10.12 10:03, Stefan Guggisberg wrote: > On Wed, Oct 17, 2012 at 10:42 AM, Michael D�rig wrote: >> >> I wonder why the Microkernel API has an asymmetry here: for writing a binary >> you can pass a stream where as for reading you need to pass a byte array. > > the write method implies a content-addressable storage for blobs, > i.e. identical binary content is identified by identical identifiers. > the identifier > needs to be computed from the entire blob content. that's why the > signature takes > a stream rather than supporting chunked writes. Makes sense so far but this is only half of the story ;-) Why couldn't the read method also return a stream? Michael > > cheers > stefan > >> >> Michael >> >> >> On 26.9.12 8:38, Mete Atamel wrote: >>> >>> Hi, >>> >>> I realized that MicroKernelIT#testBlobs takes a while to complete on >>> MongoMK. This is partly due to how the test was written and partly due to >>> how the blob read offset is implemented in MongoMK. I'm looking for >>> feedback on where to fix this. >>> >>> To give you an idea on testBlobs, it first writes a blob using MK. Then, >>> it verifies that the blob bytes were written correctly by reading the blob >>> from MK. However, blob read from MK is not done in one shot. Instead, it's >>> done via this input stream: >>> >>> InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk, >>> id)); >>> >>> >>> MicroKernelInputStream reads from the MK and BufferedInputStream buffers >>> the reads in 8K chunks. Then, there's a while loop with in2.read() to read >>> the blob fully. This makes a call to MicroKernel#read method with the >>> right offset for every 8K chunk until the blob bytes are fully read. >>> >>> This is not a problem for small blob sizes but for bigger blob sizes, >>> reading 8K chunks can be slow because in MongoMK, every read with offset >>> triggers the following: >>> -Find the blob from GridFS >>> -Retrieve its input stream >>> -Skip to the right offset >>> -Read 8K >>> -Close the input stream >>> >>> I could fix this by changing the test to read the blob bytes in one shot >>> and then do the comparison. However, I was wondering if we should also >>> work on an optimization for successive reads from the blob with >>> incremental offsets? Maybe we could keep the input stream of recently read >>> blobs around for some time before closing them? >>> >>> Best, >>> Mete >>> >>> >>