From user-return-4235-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Wed Apr 01 17:44:22 2009 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 77418 invoked from network); 1 Apr 2009 17:44:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Apr 2009 17:44:22 -0000 Received: (qmail 45285 invoked by uid 500); 1 Apr 2009 17:44:21 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 45204 invoked by uid 500); 1 Apr 2009 17:44:21 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 45194 invoked by uid 99); 1 Apr 2009 17:44:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2009 17:44:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.132.245 as permitted sender) Received: from [209.85.132.245] (HELO an-out-0708.google.com) (209.85.132.245) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2009 17:44:12 +0000 Received: by an-out-0708.google.com with SMTP id b2so100970ana.5 for ; Wed, 01 Apr 2009 10:43:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=72y4FZ4mgZ2/kC4OCzhFKNl8GmuZqyl7rxDGNLgC4uY=; b=WPFaCaBTJV+zjgLiv7GQ1PSnDxJFYLVFHEBCGXfW7Gc9f7ZQtnMHZ+p8T9DE5z69iA /1VF03XZUlG6VD1nnVXKPc1Rnp+1jWvWLIhngZv1iajaRXae8hYXRYZ43K9qhbRRYD8d dO8egMULeekMOO4bW0JSx9ZCJvHZGlKfIqwX4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=NHQCzA+Irpda8gDhnO1ueaTxVDjckwNMuLdzmr4vRHXe5GZfXYsmTVg3VY66KuMw9K 0/nW4QcXxzlCD0HrGiIhoR6cfE4twvN4yyRWR4oEoRKOQ0yZMzqFQisIPRPgmKrleJEi YxBzFHjgrMZ3No/R8oR4O/yE/RF+yC5Q/69a8= MIME-Version: 1.0 Received: by 10.100.137.12 with SMTP id k12mr6933789and.124.1238607830542; Wed, 01 Apr 2009 10:43:50 -0700 (PDT) In-Reply-To: <49D3A4E5.6010605@proven-corporation.com> References: <938235.99022.qm@web30608.mail.mud.yahoo.com> <7db9abd30904011016r61c0e161ha84b85e9a0486238@mail.gmail.com> <49D3A4E5.6010605@proven-corporation.com> Date: Wed, 1 Apr 2009 13:43:50 -0400 Message-ID: Subject: Re: Suggestions on View performance optimization/improvement From: Paul Davis To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Apr 1, 2009 at 1:31 PM, Jason Smith wr= ote: > I'd be very interested to know the performance impact of that optimizatio= n > as well. =A0What is the overhead or bottleneck with large view values? > =A0Estimating 100 bytes per key/value pair within each of the million > documents, that's 2GB of raw data, which should write to a laptop disk > within 2 minutes. > > I'm wondering whether it matters how large the view values are, since the= y > would seem not to be involved in the view processing very much--only writ= ten > to disk in the order defined by the keys. > > Of course, that goes against the common wisdom that the fastest thing to = do > is emit(key, null); but that could impact the application significantly > since you have to query again for the documents. =A0(I'm unsure whether > include_docs has a performance penalty either.) > > I guess what I'm asking is, why does the value side of views impact > performance so greatly? > Other than the extra disk I/O, there's also the necessary term_to_binary calls that add overhead. No idea how much that factors in though. For include_docs, you're adding an N log(N) cost to reading from the view but I haven't the slightest how that might translate to wall clock time. It'd be an interesting thing to measure though. If you include_docs for M rows with M 1 -> 1M or so and then find the math to get an approximation for how long it takes to read through the tree. I'm almost tempted but work is calling. That said, the best way to find out would be to measure. I don't think I've seen numbers on this yet so anything you can show would be definitely be valuable information. Especially if you can demonstrate the trade offs between emit(key, doc); vs emit(key, null) & include_docs=3DTrue HTH, Paul Davis > kowsik wrote: >> >> I would highly recommend that you do emit(doc.field, null) so that the >> key space doesn't get unwieldy and large. Since the id of the document >> is part of the map results, you can always fetch it using >> include_docs=3Dtrue. >> >> K. >> >> On Wed, Apr 1, 2009 at 10:12 AM, Manjunath Somashekhar >> wrote: >>> >>> hi All, >>> >>> We have been using couchdb (built out of trunk) for prototyping an idea >>> and would like to thank and congratulate you folks for a simple and usa= ble >>> schema free db. >>> >>> We plan to store few million documents in couchdb and we would like to >>> create couple of views to fetch the data appropriately. We have inserte= d a >>> million documents (each containing about 20 fields). We are >>> indexing/creating a view on a particular field of the document. The map >>> function of the view is simple straight forward emit (emit(doc.field, d= oc)). >>> It takes about 90 mins to build the required B-Tree index the first tim= e. >>> All the subsequent queries are performing extremely well (milli second >>> responses). Can anything be done to reduce the 90 mins taken to build t= he >>> required B-Tree index the first time? >>> >>> Environment details: >>> Couchdb - 0.9.0a757326 >>> Erlang - 5.6.5 >>> Linux kernel - 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i6= 86 >>> GNU/Linux >>> Ubuntu distribution >>> Centrino Dual core, 4GB RAM laptop >>> >>> Thanks >>> Manju >>> >>> >>> >>> > > -- > Jason Smith > Proven Corporation > Bangkok, Thailand > http://www.proven-corporation.com >