Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8349DD4C3 for ; Wed, 17 Oct 2012 09:03:06 +0000 (UTC) Received: (qmail 1639 invoked by uid 500); 17 Oct 2012 09:03:06 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 1436 invoked by uid 500); 17 Oct 2012 09:03:02 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 1398 invoked by uid 99); 17 Oct 2012 09:03:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:03:00 +0000 X-ASF-Spam-Status: No, hits=-1.3 required=5.0 tests=FRT_ADOBE2,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matamel@adobe.com designates 64.18.1.31 as permitted sender) Received: from [64.18.1.31] (HELO exprod6og113.obsmtp.com) (64.18.1.31) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 09:02:51 +0000 Received: from outbound-smtp-1.corp.adobe.com ([192.150.11.134]) by exprod6ob113.postini.com ([64.18.5.12]) with SMTP ID DSNKUH50JzqQys1z5sazjCh0oynnQzvNzyOC@postini.com; Wed, 17 Oct 2012 02:02:31 PDT Received: from inner-relay-1.corp.adobe.com ([153.32.1.51]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q9H8xn1v027365 for ; Wed, 17 Oct 2012 01:59:49 -0700 (PDT) Received: from nahub01.corp.adobe.com (nahub01.corp.adobe.com [10.8.189.97]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q9H92UNc009402 for ; Wed, 17 Oct 2012 02:02:30 -0700 (PDT) Received: from SJ1SWM219.corp.adobe.com (10.5.77.61) by nahub01.corp.adobe.com (10.8.189.97) with Microsoft SMTP Server (TLS) id 8.3.279.1; Wed, 17 Oct 2012 02:02:30 -0700 Received: from NAMBX02.corp.adobe.com ([10.8.127.96]) by SJ1SWM219.corp.adobe.com ([fe80::d55c:7209:7a34:fcf7%11]) with mapi; Wed, 17 Oct 2012 02:02:30 -0700 From: Mete Atamel To: "oak-dev@jackrabbit.apache.org" Date: Wed, 17 Oct 2012 02:02:23 -0700 Subject: Re: [MongoMK] Reading blobs incrementally Thread-Topic: [MongoMK] Reading blobs incrementally Thread-Index: Ac2sRiL1KXOAIJpjRpC4co5XsEb0/w== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.4.120824 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the feedback. Using AbstractBlobStore instead of GridFS is indeed on the list of things I want to try out once the rest of missing functionality is done in MongoMK. I'll report back once I get a chance to implement that. -Mete On 10/17/12 10:26 AM, "Thomas Mueller" wrote: >Hi, > >As a workaround, you could keep the last few streams open in the Mongo MK >for some time (a cache) together with the current position. That way seek >is not required in most cases, as usually binaries are read as a stream. > >However, keeping resources open is problematic (we do that in the >DbDataStore in Jackrabbit, and we ran into various problems), and I would >avoid it if possible. I would probably use the AbstractBlobStore instead >which splits blobs into blocks, I believe that way you can just use >regular MongoDB features and don't need to use GridFS. But you might want >to test which approach is faster / easier. > >Regards, >Thomas > > > >On 9/26/12 9:48 AM, "Mete Atamel" wrote: > >>Forgot to mention. I could also increase the BufferedInputStream's buffer >>size to something high to speed up the large blob read. That's probably >>what I'll do in the short term but my question is more about whether the >>optimization I mentioned in my previous email is worth pursuing at some >>point. >> >>Best, >>Mete >> >>On 9/26/12 9:38 AM, "Mete Atamel" wrote: >> >>>Hi, >>> >>>I realized that MicroKernelIT#testBlobs takes a while to complete on >>>MongoMK. This is partly due to how the test was written and partly due >>>to >>>how the blob read offset is implemented in MongoMK. I'm looking for >>>feedback on where to fix this. >>> >>>To give you an idea on testBlobs, it first writes a blob using MK. Then, >>>it verifies that the blob bytes were written correctly by reading the >>>blob >>>from MK. However, blob read from MK is not done in one shot. Instead, >>>it's >>>done via this input stream: >>> >>>InputStream in2 =3D new BufferedInputStream(new MicroKernelInputStream(m= k, >>>id)); >>> >>> >>>MicroKernelInputStream reads from the MK and BufferedInputStream buffers >>>the reads in 8K chunks. Then, there's a while loop with in2.read() to >>>read >>>the blob fully. This makes a call to MicroKernel#read method with the >>>right offset for every 8K chunk until the blob bytes are fully read. >>> >>>This is not a problem for small blob sizes but for bigger blob sizes, >>>reading 8K chunks can be slow because in MongoMK, every read with offset >>>triggers the following: >>>-Find the blob from GridFS >>>-Retrieve its input stream >>>-Skip to the right offset >>>-Read 8K=20 >>>-Close the input stream >>> >>>I could fix this by changing the test to read the blob bytes in one shot >>>and then do the comparison. However, I was wondering if we should also >>>work on an optimization for successive reads from the blob with >>>incremental offsets? Maybe we could keep the input stream of recently >>>read >>>blobs around for some time before closing them? >>> >>>Best, >>>Mete >>> >>> >> >