Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 89298 invoked from network); 1 Aug 2008 07:38:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Aug 2008 07:38:42 -0000 Received: (qmail 83104 invoked by uid 500); 1 Aug 2008 07:38:41 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 82954 invoked by uid 500); 1 Aug 2008 07:38:40 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 82943 invoked by uid 99); 1 Aug 2008 07:38:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2008 00:38:40 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of johan.liseborn@gmail.com designates 72.14.220.157 as permitted sender) Received: from [72.14.220.157] (HELO fg-out-1718.google.com) (72.14.220.157) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2008 07:37:45 +0000 Received: by fg-out-1718.google.com with SMTP id l26so449812fgb.26 for ; Fri, 01 Aug 2008 00:37:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=dnon+fADx+Ev2fYBEcUFtXkJ18s/Fouo/wueA05MYkw=; b=ugCZdjubwBLly5VM6XqL4vbEwB5XqKclz6SPkVfwAGtou9dmXbGjh5iruSWU4xntvn 3VoTY5/ewKuzsOF4V9LcMIWxW3JqQgkOsnzg38Tlh0MmxIjVQ95ndMLcQLicZlIAI9J3 pmw/oRueVp2F8CkiNve2fgPPdJsv54XDjE3XE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=MjDk40cUubPMMTlQJBSEWUQII2tgJnSoxbahJvNylPQbGPSM7xaSnulYhOnovhdSdx 5qKG7NJhpZeYJI3vACXkF+gx/d3KSiZ041V0VFpfJnBEW6z1kM81MTeuFjaD/BO6fZZs /ae5DnuUK2RFPdkiUDznxJFGpn6BoMMSeAg7c= Received: by 10.86.83.2 with SMTP id g2mr7013677fgb.54.1217576273649; Fri, 01 Aug 2008 00:37:53 -0700 (PDT) Received: by 10.86.54.12 with HTTP; Fri, 1 Aug 2008 00:37:53 -0700 (PDT) Message-ID: <3a344bc70808010037h2c3fb1b5raca06f634378b79@mail.gmail.com> Date: Fri, 1 Aug 2008 09:37:53 +0200 From: "Johan Liseborn" To: couchdb-user@incubator.apache.org Subject: Re: Is it possible to evaluate a view on a 20.000 documents database? In-Reply-To: <4aa4f4d60807311538h5a9c9d02k2dc507a0b566f34@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4aa4f4d60807311410m6077d984x83cb3b5d4e8c6f17@mail.gmail.com> <06C596D4-E5D1-4161-BFF1-D3A40FFB4627@gmail.com> <4aa4f4d60807311538h5a9c9d02k2dc507a0b566f34@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Aug 1, 2008 at 00:38, Demetrius Nunes wrote: > The view I am trying to create is really simple: > > function(doc) { > if > (doc.classe_id.match(/8a8090a20075ffba010075ffbed600028a8090a20075ffba010075ffbf7200c48a8090a20075ffba010075ffbf7200d9/)) > emit(doc.id, doc); > } > > It's being applied to a 20.000 documents dataset and I've already waited > several minutes until the CPU cooled off, but to my surprise, the view is > still taking a long time to respond when I try to run it. Ive never actually > got a result out of it... > > Am I doing something wrong? I guess you have already gotten a number of answers, but just to give you some additional input (which points in the same direction), here is some data from a little experiment I just did: I have a database consisting of documents that describe "projects"; each document has a number of fields including fields for project manager, due date, an array of project activities (which in turn has descriptions, an array of assigned workers, etc), an array of notes, and a field giving the priority (the point being the documents are "semi-complex", or at least I *think* they could be considered so; I am not sure how much this matter, but it seems to matter a little, at least when the document itself is part of the output of the view (which it *isn't* in my example below, but anyway...)). I am running this on a second generation MacBook (core 2 duo) with Erlang R12B-3, SMP enabled. Now, I have a view which gives me the number of projects per priority level. The view consists of the following map and reduce functions (mind you, I am not sure that I am doing this entirely correctly, I am pretty new to using CouchDB (my third day of playing with actually), and I am still figuring the map/reduce stuff out; the result of the view seems to be correct though): map: function(doc) { if (doc.type == 'task') emit(doc.priority, 1); } reduce: function(keys, values) { return sum(values); } I just ran a test where I had a database already consisting of 42.000 project documents (the view had already been indexed on these documents). I added an additional 10.000 documents, and then ran the view above like so: Johans-MacBook% time curl 'localhost:5984/test-001/_view/tasks/per_prio_count?group=true' The result I got back was: {"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]} curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true' 0.01s user 0.03s system 0% cpu 13:32.02 total Running the view again gave the following result: {"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]} curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true' 0.00s user 0.00s system 0% cpu 0.703 total As the last part, I added an additional 10 documents and then re-ran the view, giving the following result: {"rows":[{"key":1,"value":10392},{"key":2,"value":10400},{"key":3,"value":10487},{"key":4,"value":10322},{"key":5,"value":10409}]} curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true' 0.00s user 0.00s system 0% cpu 1.207 total AFAIU, when you add new documents and then evaluate a view including those documents, indexing will happen, but only for the newly added documents (i.e. already indexed documents will not be re-indexed). I believe this means that the time to index will be, in some way, proportional to the number of *new* documents. I believe I have seen a big-O "number" for this somewhere, but I don't remember right now if it is O(n), O(log n), or something else (I am sure someone else on the list can answer that :-). As can be seen from the results, when CouchDB had to index the 10.000 new documents, it took about 13 minutes to get the result, but when all the documents had been indexed, the answer came back in 0.7 seconds. Having to index 10 documents did not take that long, giving an answer in 1.2 seconds. Hope this help in some way. Cheers, johan P.S. I am really excited about CouchDB; kudos to Damien and everyone else involved (sorry, I don't know all of your names yet :-) -- Johan Liseborn