Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1BB997A1 for ; Tue, 13 Mar 2012 22:58:43 +0000 (UTC) Received: (qmail 59444 invoked by uid 500); 13 Mar 2012 22:58:42 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 59408 invoked by uid 500); 13 Mar 2012 22:58:42 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 59393 invoked by uid 99); 13 Mar 2012 22:58:42 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 22:58:42 +0000 Received: from localhost (HELO mail-iy0-f180.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 22:58:41 +0000 Received: by iage36 with SMTP id e36so1901915iag.11 for ; Tue, 13 Mar 2012 15:58:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.149.163 with SMTP id ub3mr7837706igb.30.1331679521209; Tue, 13 Mar 2012 15:58:41 -0700 (PDT) Received: by 10.42.99.195 with HTTP; Tue, 13 Mar 2012 15:58:41 -0700 (PDT) In-Reply-To: References: Date: Tue, 13 Mar 2012 22:58:41 +0000 Message-ID: Subject: Re: Creating a database with lots of documents and updating a view From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The view build is already batched. In my opinion your strategy A can only ever be slower or the same speed as B. Try inserting the docs using _bulk_docs, it'll go much faster. I'd fill the database up and hit the view at the end for the fastest build time, but I'd still expect it take a while to build the view the first time. Do you have a reduce on the view? Are there other views in the same design document? B. On 13 March 2012 22:45, Daniel Gonzalez wrote: > Hi, > > I am creating a database with lots of documents (3 million). > I have a view in the database: > > function(doc) { > =A0 =A0if (doc.PORTED_NUMBER) emit(doc.PORTED_NUMBER, doc.RECEIVING_OPERA= TOR); > } > > To speed up view creation, I am doing the following (Strategy A) > > =A0 1. Define view > =A0 2. Insert 1000 documents > =A0 3. Access the view > =A0 4. Goto 2 > > And I repeat this process until all documents have been inserted. > > I have read that this is faster than my previous strategy (Strategy B, > obsolete): > > =A0 1. Insert all documents > =A0 2. Define view > =A0 3. Access view > > My problem is that, in my current Strategy A, step 3 is taking longer and > longer. Currently I have around 300 thousand documents inserted and view > access is taking around 120s. > The evolution of the delay in view access has been: > > 2012-03-13 23:01:40,405 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:03:29,589 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 109 > 2012-03-13 23:03:32,945 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:05:31,699 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 118 > 2012-03-13 23:05:35,106 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:07:28,392 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 113 > 2012-03-13 23:07:31,663 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:09:26,929 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 115 > 2012-03-13 23:09:30,572 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:11:27,490 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 116 > 2012-03-13 23:11:30,784 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:13:21,575 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 110 > 2012-03-13 23:13:24,937 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:15:23,519 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 118 > 2012-03-13 23:15:26,836 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > 2012-03-13 23:17:23,036 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - View > ready, ellapsed 116 > 2012-03-13 23:17:26,310 - __main__ =A0 =A0 =A0 =A0 =A0 =A0 - INFO =A0 =A0= =A0 - =A0 =A0 =A0 - > BulkSend >> requested=3D =A0 1000 ok=3D =A0 1000 errors=3D =A0 =A0 =A00 > > It started with around 1s, and it is increasing more or less monotonicall= y. > It is already running since 7 hours ago, and only 300000 documents have > been imported and indexed. > If everything continues like this (I do not know what kind of matematical > function this is following, but for me it seems like an exponential > function), importing the 3 million of documents is going to take forever. > > Is there a way to speed this up? > > Thanks! > Daniel