Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 67979 invoked from network); 2 Jul 2009 21:50:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jul 2009 21:50:53 -0000 Received: (qmail 23739 invoked by uid 500); 2 Jul 2009 21:51:03 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 23666 invoked by uid 500); 2 Jul 2009 21:51:03 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 23641 invoked by uid 99); 2 Jul 2009 21:51:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 21:51:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sshumaker@gmail.com designates 74.125.92.24 as permitted sender) Received: from [74.125.92.24] (HELO qw-out-2122.google.com) (74.125.92.24) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 21:50:51 +0000 Received: by qw-out-2122.google.com with SMTP id 3so829827qwe.29 for ; Thu, 02 Jul 2009 14:50:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Gt4bCyHog6cW7JYy8E4hOEDqGoGleNJSnvEpVdDilW8=; b=ZbYGASnURB5MejAiaj/gfPw2kw+LxbmdHkxUdp7AX9oD2tQqOMOl7jvyYGbnaoJVvg 6V6sbjKUapKwz18soXlYLmqWZQokBMxMOu25QyLYmkkcKtbr97ZOwqjSQz3PwjpfaOXe g0NIVvNGD/ZwMzVQ2RQ5CkIdMdItA9sJGFf0U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=SkCNhkqF7MHpDd8MCm+aBynbksI/TE7Bi3Ifuj2Eocmkcg4f2Vo7MkrAn2CDKroc86 Jju41xQsHfzmxG9v1AatSEq779IQwdQOeAUemOZpQeubKOGBm4uZTF4iE1/7M/TLr7KI JAUYXX9tfvPN0Q4t5OUPxzSP+aZGEKj8mF9Tg= MIME-Version: 1.0 Received: by 10.224.67.134 with SMTP id r6mr743108qai.352.1246571430839; Thu, 02 Jul 2009 14:50:30 -0700 (PDT) In-Reply-To: <261cf6280907021442w48882a1fqd563fdf4af39486d@mail.gmail.com> References: <20090702112455.GA25891@uk.tiscali.com> <20090702132047.GA27924@uk.tiscali.com> <20090702143834.GA5202@uk.tiscali.com> <060429AB-10DE-482B-8993-27FBD486F191@jasondavies.com> <4C82520A-A9E1-4F84-8630-0ED3810D8E61@apache.org> <77950CB1-716C-4C5D-B97E-3C398F728E15@apache.org> <261cf6280907021415x3746a14cm4cbe0cf914a951cf@mail.gmail.com> <261cf6280907021442w48882a1fqd563fdf4af39486d@mail.gmail.com> Date: Thu, 2 Jul 2009 14:50:30 -0700 Message-ID: <261cf6280907021450x48c8d5beob8b8c79e4091b31f@mail.gmail.com> Subject: Re: View Performance (was Re: The 1.0 Thread) From: Scott Shumaker To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org One question, though: Why are the emitted view results stored as erlang terms, as opposed to storing the JSON returned from the view server - which is what you'll be serving to the clients anyway? If you skipped the reverse json->erlang encoding, and additionally stored a cached json copy of each document alongside the document whenever a document in couchdb was created/updated (which you could incrementally generate in a separate erlang process so you don't have to slow down write performance) - and just pass this json copy to the view, you could basically eliminate the json->erlang conversion overhead entirely (since it would only be done asynchronously). Even if you need to store the emitted view results back into erlang, you could have a special optimization case for emitting (key, doc) - because you already have the document as both erlang/json (assuming you were storing cached json copies). And include_docs would get faster since you wouldn't need to do the json conversion there either. Just a thought. Scott On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker wrote: > I should mention that we tend to emit (doc._id, doc) in our views - as > opposed to doc._id, null and using include_docs - because we found > that doc._id,null gave us a 30% speedup on building the views, but > cost us about the same on each additional hit to the view. > > Scott > > On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker wrote= : >> We see times that are considerably worse. =A0We mostly have maps - very >> few reduces. =A0We have 40k objects, about 25 design docs, and 90 views. >> =A0Although we're about to change the code to auto-generate the design >> docs based on the view filters used (re: view filter patch) - see if >> that helps. >> >> Maybe it's because we have larger objects - but re-indexing a typical >> new view takes > 5 minutes (with view filtering off). =A0Some are worse. >> =A0With view filtering on some can be quite fast - some views finish in >> like 10 seconds. =A0Interestingly, reindexing all views takes about an >> hour - with or without view filtering. =A0I'm guessing that a >> substantial part of the bottleneck is erlang -> json serialization. >> Many of our objects are heavily nested structures and exceed 10k in >> size. =A0One other note - when we tried dropping in the optimized >> 'main.js' posted on the mailing list, we saw an overall 20% speedup. >> Unfortunately, it wasn't compatible with the authentication stuff, and >> the deployment was a bit wacky, so we're holding off on that right >> now. >> >> >> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz wrote: >>> >>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote: >>> >>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz wrote: >>>>> >>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote: >>>>> >>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote: >>>>>> >>>>>>> For some fruit that was so low-hanging that I nearly stubbed my toe= on >>>>>>> it, >>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399 >>>>>> >>>>>> >>>>>> Nice work! =A0I'd be interested to see what kind of performance incr= ease >>>>>> we >>>>>> get from Spidermonkey 1.8.1, which comes with native JSON >>>>>> parsing/encoding. >>>>>> =A0See here for details: >>>>>> https://developer.mozilla.org/En/Using_native_JSON=A0. >>>>>> >>>>>> Rumour has it 1.8.1 will be released any time soon (TM) >>>>> >>>>> I'm not sure the new engine is such a no-brainer. One thing about the= new >>>>> generation of JS VMs is we've seen greatly increased memory usage wit= h >>>>> earlier versions. Also the startup times might be longer, or shorter. >>>>> >>>>> Though I wonder if this can be improved by forking a JS process rathe= r >>>>> than >>>>> spawning a new process. >>>>> >>>> >>>> Memory usage is a definite concern. I'm not sure I follow why startup >>>> times would be important though. Am I missing something? >>> >>> Start up time isn't a huge concern, but it's is a something to consider= . On >>> a heavily loaded system, scripts that normally work might start to time= out, >>> requiring restarting the process. Lots of restarts may start to eat lot= s cpu >>> and memory IO. >>> >>> -Damien >>> >>> >>>> >>>>> -Damien >>>>> >>>>>> -- >>>>>> Jason Davies >>>>>> >>>>>> www.jasondavies.com >>>>>> >>>>> >>>>> >>> >>> >> >