Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 58297 invoked from network); 24 Jul 2010 05:52:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jul 2010 05:52:01 -0000 Received: (qmail 76276 invoked by uid 500); 24 Jul 2010 05:52:00 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 75812 invoked by uid 500); 24 Jul 2010 05:51:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 75792 invoked by uid 99); 24 Jul 2010 05:51:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 05:51:55 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.83.180] (HELO mail-pv0-f180.google.com) (74.125.83.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 05:51:47 +0000 Received: by pvg12 with SMTP id 12so8165934pvg.11 for ; Fri, 23 Jul 2010 22:51:26 -0700 (PDT) Received: by 10.142.133.20 with SMTP id g20mr5327017wfd.175.1279950686850; Fri, 23 Jul 2010 22:51:26 -0700 (PDT) Received: from [192.168.1.7] (174-24-128-47.tukw.qwest.net [174.24.128.47]) by mx.google.com with ESMTPS id k25sm1192666rvb.16.2010.07.23.22.51.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 23 Jul 2010 22:51:26 -0700 (PDT) Message-Id: <04BDA5F8-F209-4258-A5A1-52DFCF3784B0@mymedify.com> From: Talib Sharif To: user@couchdb.apache.org In-Reply-To: <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: Some stats about couch DB Date: Fri, 23 Jul 2010 22:51:22 -0700 References: <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org> X-Mailer: Apple Mail (2.936) Thanks Chris, This is extremely helpful. -Talib On Jul 23, 2010, at 6:42 PM, J Chris Anderson wrote: > > On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote: > >> Hi All, >> >> As I am playing more and more with couchdb (it is relaxing and >> fun), i just am trying to understand the limits and the >> expectations in large production system environment. >> >> Right now i have about 100K documents and i have about 10 different >> views, one of the view generates does about 100 emits per document. >> >> As i am building the view indexes, it is taking about 7-8 hours of >> time. >> > > this is about right for 10 million rows. That works out to about 350 > rows per second (maybe more depending on what your other view are > doing), which is a bit slower than I'm used to seeing, but it > depends on the size of your emitted keys and values. If you can > shrink the keys or the values you should see some speedup (marginal, > not an order of magnitude). > > because view generation is incremental, in production the 7-8 hours > isn't the big issue, it's whether view generation can keep up with > the insert rate. So if you are generating less than a few documents > per second (x 100 emitted rows) then you should be able to keep the > indexes current. If the indexes start to fall behind I'd suggest > either upgrading hardware or moving to a clustered solution like > CouchDB-Lounge. > > for purposes of prototyping you will probably be happier working on > a subset of the documents. > > >> I would like to know is that how are other people using it? >> Is 7-8 or even 24 hours of checkpointing view generation typical? >> How many documents do people have?? >> How is other people's experience in genereting a view on lets say 1 >> MIllion documents. >> >> I have switched to the native _sum function for reduce. What else >> is taking long? Is it the map function written in JavaScript? Is it >> the Index that's getting too big? >> > > > using an Erlang view function could potentially speed things up (but > my guess is you are more likely disk-io bound, not CPU bound, so > maybe it won't make much difference.) > > >> Is the view generation linear or does it gets worse when you have >> more documents? >> > > > the btree should get slower at roughly O(log n) where n is the > number of rows. The base of the log is pretty big, too. Once you get > up to the billion-rows territory you'll probably want to look more > closely at CouchDB Lounge or the Cloudant clustering. > >> I would extremely appreciate help in answering or discussing these >> questions. >> >> Thanks in advance, >> Talib >> >