Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 57037 invoked from network); 4 Jul 2009 22:26:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jul 2009 22:26:45 -0000 Received: (qmail 16464 invoked by uid 500); 4 Jul 2009 22:26:54 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 16386 invoked by uid 500); 4 Jul 2009 22:26:54 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 16376 invoked by uid 99); 4 Jul 2009 22:26:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 22:26:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sshumaker@gmail.com designates 209.85.221.191 as permitted sender) Received: from [209.85.221.191] (HELO mail-qy0-f191.google.com) (209.85.221.191) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 22:26:44 +0000 Received: by qyk29 with SMTP id 29so1041539qyk.13 for ; Sat, 04 Jul 2009 15:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=WspOmbXVZ4/IpdheuZjiB5g1iAF/iNhHnryFcdQpbRI=; b=MZ2KNtbC1U9qdNUSDUouJ4UyBdDVCMQmGK50jJHGIexyqZgBGQjFSp8HwVkwznCvtx XbmYNSHY5jdELXb9upXLxHDqy6CbDsSEJ6taaU9YYg45NnUPNq59S9pqkhz/MP4PSuXr d1XgaP0wLPNvKacT2jgXAF6HxGD61yTKGf7QA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=aOoxJPyUmAaPzsSE+xUPF4a3rz7H6KB3i6/w6e35yYWPqIcPESiW1DmZ4JkR1NT9oS Cn5Fv2O+sRNoCEJ5nwPBAXWEMJiexVoZFhBmAu8BebKrCd0HCppd/O7ot+Ap4voVYJTL Pjtu1tu7i4lYKKvTltK5QCY8NwKBWN+Sfbk+E= MIME-Version: 1.0 Received: by 10.224.89.72 with SMTP id d8mr3406560qam.129.1246746383285; Sat, 04 Jul 2009 15:26:23 -0700 (PDT) In-Reply-To: <261cf6280907041139p73f673eehe422f9df9abe88b6@mail.gmail.com> References: <4C82520A-A9E1-4F84-8630-0ED3810D8E61@apache.org> <77950CB1-716C-4C5D-B97E-3C398F728E15@apache.org> <261cf6280907021415x3746a14cm4cbe0cf914a951cf@mail.gmail.com> <261cf6280907021442w48882a1fqd563fdf4af39486d@mail.gmail.com> <261cf6280907021450x48c8d5beob8b8c79e4091b31f@mail.gmail.com> <261cf6280907021617r7c84c91ak71605ee619556000@mail.gmail.com> <261cf6280907041139p73f673eehe422f9df9abe88b6@mail.gmail.com> Date: Sat, 4 Jul 2009 15:26:23 -0700 Message-ID: <261cf6280907041526n1d798770l16a0f4ec9f74e702@mail.gmail.com> Subject: Re: View Performance (was Re: The 1.0 Thread) From: Scott Shumaker To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Ok - here's some more detailed stats: Note that this is couch-0.9.0 with hipe enabled and the filter patch, on my macbook pro. ~53K db documents, ~1500 are type:restaurant We tested using Brian's bork.rb: no filtering: bork.rb - returning no values =3D 68s bork.rb - returning 5 values per map(doc) call =3D 200s couchjs - returning no values =3D 93s couchjs - one doc emitted per type:restaurant =3D 104s w/ filtering: (select ~1500 docs out of 53K) couchjs - returning no values =3D 8.9s couchjs - one doc emitted per type:restaurant =3D 19s Couple of notes: 53K docs apparently take 68s to be converted to JSON, and received by the dummy server (with no docs emitted) - or about 780 docs/second. couchjs is slower than bork.rb in this case (unsurprising - bork.rb not really parsing the data) filtering on the couch side is an enormous win for our test case. K/V inserts - (5*53K in (200-68)s) =3D ~2000 per second This is a pretty big difference from Brian's results (8000/sec), although we're dealing with many more docs, and without comparing hardware specs, it's difficult to draw conclusions. On Sat, Jul 4, 2009 at 11:39 AM, Scott Shumaker wrote: > Compiling with HiPE didn't seem to make any difference in performance. = =A0:( > > On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker wrote= : >> I'll try that out tomorrow and post the results here. >> >> On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis = wrote: >>> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker wro= te: >>>> One question, though: Why are the emitted view results stored as >>>> erlang terms, as opposed to storing the JSON returned from the view >>>> server - which is what you'll be serving to the clients anyway? >>>> >>>> If you skipped the reverse json->erlang encoding, and additionally >>>> stored a cached json copy of each document alongside the document >>>> whenever a document in couchdb was created/updated (which you could >>>> incrementally generate in a separate erlang process so you don't have >>>> to slow down write performance) - and just pass this json copy to the >>>> view, you could basically eliminate the json->erlang conversion >>>> overhead entirely (since it would only be done asynchronously). >>>> >>>> Even if you need to store the emitted view results back into erlang, >>>> you could have a special optimization case for emitting (key, doc) - >>>> because you already have the document as both erlang/json (assuming >>>> you were storing cached json copies). =A0And include_docs would get >>>> faster since you wouldn't need to do the json conversion there either. >>>> >>>> Just a thought. >>>> >>> >>> Premature optimization is the root of all evil? Have you tried >>> compiling CouchDB with HiPE enabled. I'm inclined to agree with you >>> that the large JSON values are probably a significant cause here. >>> Assuming your Erlang is HiPE enabled you can do something like this to >>> compile CouchDB: >>> >>> =A0 =A0$ ./bootstrap >>> =A0 =A0$ ERLC_FLAGS=3D"+native +inline +inline_list_funcs" ./configure >>> =A0 =A0$ make >>> =A0 =A0$ sudo make install >>> >>> >>>> Scott >>>> >>>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker wr= ote: >>>>> I should mention that we tend to emit (doc._id, doc) in our views - a= s >>>>> opposed to doc._id, null and using include_docs - because we found >>>>> that doc._id,null gave us a 30% speedup on building the views, but >>>>> cost us about the same on each additional hit to the view. >>>>> >>>>> Scott >>>>> >>>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker w= rote: >>>>>> We see times that are considerably worse. =A0We mostly have maps - v= ery >>>>>> few reduces. =A0We have 40k objects, about 25 design docs, and 90 vi= ews. >>>>>> =A0Although we're about to change the code to auto-generate the desi= gn >>>>>> docs based on the view filters used (re: view filter patch) - see if >>>>>> that helps. >>>>>> >>>>>> Maybe it's because we have larger objects - but re-indexing a typica= l >>>>>> new view takes > 5 minutes (with view filtering off). =A0Some are wo= rse. >>>>>> =A0With view filtering on some can be quite fast - some views finish= in >>>>>> like 10 seconds. =A0Interestingly, reindexing all views takes about = an >>>>>> hour - with or without view filtering. =A0I'm guessing that a >>>>>> substantial part of the bottleneck is erlang -> json serialization. >>>>>> Many of our objects are heavily nested structures and exceed 10k in >>>>>> size. =A0One other note - when we tried dropping in the optimized >>>>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup. >>>>>> Unfortunately, it wasn't compatible with the authentication stuff, a= nd >>>>>> the deployment was a bit wacky, so we're holding off on that right >>>>>> now. >>>>>> >>>>>> >>>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz wrot= e: >>>>>>> >>>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote: >>>>>>> >>>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz wro= te: >>>>>>>>> >>>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote: >>>>>>>>> >>>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote: >>>>>>>>>> >>>>>>>>>>> For some fruit that was so low-hanging that I nearly stubbed my= toe on >>>>>>>>>>> it, >>>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nice work! =A0I'd be interested to see what kind of performance = increase >>>>>>>>>> we >>>>>>>>>> get from Spidermonkey 1.8.1, which comes with native JSON >>>>>>>>>> parsing/encoding. >>>>>>>>>> =A0See here for details: >>>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON=A0. >>>>>>>>>> >>>>>>>>>> Rumour has it 1.8.1 will be released any time soon (TM) >>>>>>>>> >>>>>>>>> I'm not sure the new engine is such a no-brainer. One thing about= the new >>>>>>>>> generation of JS VMs is we've seen greatly increased memory usage= with >>>>>>>>> earlier versions. Also the startup times might be longer, or shor= ter. >>>>>>>>> >>>>>>>>> Though I wonder if this can be improved by forking a JS process r= ather >>>>>>>>> than >>>>>>>>> spawning a new process. >>>>>>>>> >>>>>>>> >>>>>>>> Memory usage is a definite concern. I'm not sure I follow why star= tup >>>>>>>> times would be important though. Am I missing something? >>>>>>> >>>>>>> Start up time isn't a huge concern, but it's is a something to cons= ider. On >>>>>>> a heavily loaded system, scripts that normally work might start to = time out, >>>>>>> requiring restarting the process. Lots of restarts may start to eat= lots cpu >>>>>>> and memory IO. >>>>>>> >>>>>>> -Damien >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> -Damien >>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason Davies >>>>>>>>>> >>>>>>>>>> www.jasondavies.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >