Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E2CCD51A for ; Wed, 17 Oct 2012 09:21:15 +0000 (UTC) Received: (qmail 49347 invoked by uid 500); 17 Oct 2012 09:21:15 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 49201 invoked by uid 500); 17 Oct 2012 09:21:12 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 49180 invoked by uid 99); 17 Oct 2012 09:21:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:21:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stefan.guggisberg@gmail.com designates 209.85.219.42 as permitted sender) Received: from [209.85.219.42] (HELO mail-oa0-f42.google.com) (209.85.219.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:21:04 +0000 Received: by mail-oa0-f42.google.com with SMTP id j1so15354745oag.1 for ; Wed, 17 Oct 2012 02:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=q7/zycASqRr3OlaGa4NLnNGM+QYwPx+FV53WZv8qaxs=; b=zEum1nltKi988OYtJTJ08VeSbSyqA1O+fxyZlF3TtWIzdOQX+BcPKfDKOHjr70HGGs /hM3azB5GNDVTOhQybgtW4x1DWQyy+86ythBmUvViTfZuOoSMSw6JGL+8FWYFyXYrS2q wF1M4sdx4iUFUGU5bCRfpMkJkCc9IV8P8cDS/vjpWULILFZcEiXbNSI2QRnMh7ixFcHp eN+2RZsNjywsDf2Lz5Vuky1U5dK4BgwcnP36uUrsk/NQ84YDO/f+DCjdZX1rKFAKxA2U 4BBa4+PxMKr2NJp2AqyBOT0sb0PG3/DFHgUrErSYHh8Bq5Ona6cVlh6Gyo75wUEpdrRW JuEQ== MIME-Version: 1.0 Received: by 10.60.31.198 with SMTP id c6mr14622882oei.112.1350465643348; Wed, 17 Oct 2012 02:20:43 -0700 (PDT) Received: by 10.76.73.129 with HTTP; Wed, 17 Oct 2012 02:20:43 -0700 (PDT) In-Reply-To: <507E7586.4080609@apache.org> References: <507E6F78.5030602@apache.org> <507E7586.4080609@apache.org> Date: Wed, 17 Oct 2012 11:20:43 +0200 Message-ID: Subject: Re: [MongoMK] Reading blobs incrementally From: Stefan Guggisberg To: oak-dev@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Oct 17, 2012 at 11:08 AM, Michael D=FCrig wrot= e: > > > On 17.10.12 10:03, Stefan Guggisberg wrote: >> >> On Wed, Oct 17, 2012 at 10:42 AM, Michael D=FCrig >> wrote: >>> >>> >>> I wonder why the Microkernel API has an asymmetry here: for writing a >>> binary >>> you can pass a stream where as for reading you need to pass a byte arra= y. >> >> >> the write method implies a content-addressable storage for blobs, >> i.e. identical binary content is identified by identical identifiers. >> the identifier >> needs to be computed from the entire blob content. that's why the >> signature takes >> a stream rather than supporting chunked writes. > > > Makes sense so far but this is only half of the story ;-) Why couldn't th= e > read method also return a stream? it could, but then why should it? for cosmetical reasons? personally i prefer the current signature for cleaner semantics and ease of implementa= tion. cheers stefan > > Michael > > >> >> cheers >> stefan >> >>> >>> Michael >>> >>> >>> On 26.9.12 8:38, Mete Atamel wrote: >>>> >>>> >>>> Hi, >>>> >>>> I realized that MicroKernelIT#testBlobs takes a while to complete on >>>> MongoMK. This is partly due to how the test was written and partly due >>>> to >>>> how the blob read offset is implemented in MongoMK. I'm looking for >>>> feedback on where to fix this. >>>> >>>> To give you an idea on testBlobs, it first writes a blob using MK. The= n, >>>> it verifies that the blob bytes were written correctly by reading the >>>> blob >>>> from MK. However, blob read from MK is not done in one shot. Instead, >>>> it's >>>> done via this input stream: >>>> >>>> InputStream in2 =3D new BufferedInputStream(new MicroKernelInputStream= (mk, >>>> id)); >>>> >>>> >>>> MicroKernelInputStream reads from the MK and BufferedInputStream buffe= rs >>>> the reads in 8K chunks. Then, there's a while loop with in2.read() to >>>> read >>>> the blob fully. This makes a call to MicroKernel#read method with the >>>> right offset for every 8K chunk until the blob bytes are fully read. >>>> >>>> This is not a problem for small blob sizes but for bigger blob sizes, >>>> reading 8K chunks can be slow because in MongoMK, every read with offs= et >>>> triggers the following: >>>> -Find the blob from GridFS >>>> -Retrieve its input stream >>>> -Skip to the right offset >>>> -Read 8K >>>> -Close the input stream >>>> >>>> I could fix this by changing the test to read the blob bytes in one sh= ot >>>> and then do the comparison. However, I was wondering if we should also >>>> work on an optimization for successive reads from the blob with >>>> incremental offsets? Maybe we could keep the input stream of recently >>>> read >>>> blobs around for some time before closing them? >>>> >>>> Best, >>>> Mete >>>> >>>> >>> >