Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 12253 invoked from network); 24 Jul 2010 01:42:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jul 2010 01:42:38 -0000 Received: (qmail 7651 invoked by uid 500); 24 Jul 2010 01:42:37 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 7590 invoked by uid 500); 24 Jul 2010 01:42:36 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 7582 invoked by uid 99); 24 Jul 2010 01:42:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 01:42:36 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.52] (HELO mail-pz0-f52.google.com) (209.85.210.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 01:42:28 +0000 Received: by pzk27 with SMTP id 27so604851pzk.11 for ; Fri, 23 Jul 2010 18:42:08 -0700 (PDT) Received: by 10.142.156.14 with SMTP id d14mr5077420wfe.267.1279935727635; Fri, 23 Jul 2010 18:42:07 -0700 (PDT) Received: from [192.168.1.102] (c-98-248-172-14.hsd1.ca.comcast.net [98.248.172.14]) by mx.google.com with ESMTPS id q27sm958847wfc.6.2010.07.23.18.42.06 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 23 Jul 2010 18:42:07 -0700 (PDT) Sender: J Chris Anderson Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: Some stats about couch DB From: J Chris Anderson In-Reply-To: Date: Fri, 23 Jul 2010 18:42:05 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org> References: To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1081) On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote: > Hi All, >=20 > As I am playing more and more with couchdb (it is relaxing and fun), i = just am trying to understand the limits and the expectations in large = production system environment. >=20 > Right now i have about 100K documents and i have about 10 different = views, one of the view generates does about 100 emits per document. >=20 > As i am building the view indexes, it is taking about 7-8 hours of = time. >=20 this is about right for 10 million rows. That works out to about 350 = rows per second (maybe more depending on what your other view are = doing), which is a bit slower than I'm used to seeing, but it depends on = the size of your emitted keys and values. If you can shrink the keys or = the values you should see some speedup (marginal, not an order of = magnitude). because view generation is incremental, in production the 7-8 hours = isn't the big issue, it's whether view generation can keep up with the = insert rate. So if you are generating less than a few documents per = second (x 100 emitted rows) then you should be able to keep the indexes = current. If the indexes start to fall behind I'd suggest either = upgrading hardware or moving to a clustered solution like = CouchDB-Lounge. for purposes of prototyping you will probably be happier working on a = subset of the documents. > I would like to know is that how are other people using it? > Is 7-8 or even 24 hours of checkpointing view generation typical? > How many documents do people have?? > How is other people's experience in genereting a view on lets say 1 = MIllion documents. >=20 > I have switched to the native _sum function for reduce. What else is = taking long? Is it the map function written in JavaScript? Is it the = Index that's getting too big? >=20 using an Erlang view function could potentially speed things up (but my = guess is you are more likely disk-io bound, not CPU bound, so maybe it = won't make much difference.) > Is the view generation linear or does it gets worse when you have more = documents? >=20 the btree should get slower at roughly O(log n) where n is the number of = rows. The base of the log is pretty big, too. Once you get up to the = billion-rows territory you'll probably want to look more closely at = CouchDB Lounge or the Cloudant clustering. > I would extremely appreciate help in answering or discussing these = questions. >=20 > Thanks in advance, > Talib >=20