Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A458C985 for ; Thu, 24 May 2012 15:26:20 +0000 (UTC) Received: (qmail 25882 invoked by uid 500); 24 May 2012 15:26:18 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 25823 invoked by uid 500); 24 May 2012 15:26:18 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 25815 invoked by uid 99); 24 May 2012 15:26:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 15:26:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kevin.r.coombes@gmail.com designates 209.85.160.180 as permitted sender) Received: from [209.85.160.180] (HELO mail-gh0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 15:26:10 +0000 Received: by ghbz12 with SMTP id z12so2427525ghb.11 for ; Thu, 24 May 2012 08:25:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:organization:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=AOWGCD1KdGR+IRs5YViGcnvg0z04tibOCK7D/mZB1u8=; b=itEbi2aa4srTHxBhwlKFx3ggBW3asGQmRlMx7KXCDTS3eqUlL80Dv8iabVBd5DrUCX sbzVZiTUonzu/V+SxPi4DJaz3wKp6pFz8BuL7ZsJ/+mh67P+OfdhWD96UdAv75iU4Lza iNEjR4yx5yGEJh3qwP4gQwiyAvpOVdU7iHeHWM+3ML88g4fIGJhmf/WwTzXtKAvS8HKD 67FxSnYTGj3W8zd9FHP1zyUlQOTd/u3PbLajR3tbNRTbyWonNzJqe2lQtYLynldBLF1L a02ZvM4OsJUhqaCe4nMUw2Ey8GGvbeVxYEdua1VCRixZyyUocB+NzaGEiXtSwGXURkYI XR1Q== Received: by 10.60.22.201 with SMTP id g9mr8916038oef.8.1337873149774; Thu, 24 May 2012 08:25:49 -0700 (PDT) Received: from [10.105.35.136] ([143.111.22.28]) by mx.google.com with ESMTPS id r8sm18075371oer.6.2012.05.24.08.25.47 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 May 2012 08:25:48 -0700 (PDT) Message-ID: <4FBE5318.5010202@gmail.com> Date: Thu, 24 May 2012 10:26:16 -0500 From: "Kevin R. Coombes" Organization: UT M.D. Anderson Cancer Center User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: user@couchdb.apache.org CC: Robert Newson Subject: Re: Am I doing something fundamentally wrong? References: <08E52809-C962-4E9C-AFB8-397EA201580E@utt.fr> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit +1 to being surprised.... I just built a database and filled it with 5.5M documents. I then built the view I wanted; initially, it occupied 32GB. I then compacted the view, and it now takes less than 3GB. I have to admit that I really don't understand why this is the case, but I frequently see a tenfold reduction in disk space when compacting views immediately after they are first built. While compacting views will address the "disk usage" part of the initial question, it will not help with the amount of time it takes. In my experience, it takes about as along to compact a view as it does do build one. Kevin On 5/24/2012 9:21 AM, Robert Newson wrote: > Databases (and views) need compacting even if you never update or > delete a document. Try it, you might be surprised. > > B. > > On 24 May 2012 15:19, Sean Copenhaver wrote: >> I believe multiple design documents will build views concurrently but one >> design document is basically done sequentially by the change sequence... >> not positive. >> >> So you could try splitting out your views into multiple design documents >> and hit them to see if that helps spread out the CPU usage. I want to say a >> lot of the CPU usage is the serialization process that is happening >> communicating from CouchDB's core to the view engine process. >> >> Anyway with the list you specify any view and all_docs is a view with all >> documents in a database. So if you know the ids you want to work with you >> can doe a normal view query with a list function. >> http://wiki.apache.org/couchdb/HTTP_Document_API#all_docs >> >> That's what Robert was trying to get at. >> >> On Thu, May 24, 2012 at 9:55 AM, Mike Kimber wrote: >> >>> Robert, >>> >>> Couchdb Lists work on top of views (and look great by the way), however >>> that brings me back to my initial post (causes an error on this mailing >>> list for some reason but you can find a copy here >>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201205.mbox/%3CA7D50E04F38FD44D9D914F2ABCA592BF2E6E690685@BE259.mail.lan%3E) >>> :-). Namely generating a view (well a design document with views in it) on >>> our data set takes between 6 (simple view) and 16 hours, takes up a lot of >>> disk space for what seems a small amount of data and burns a CPU at 100% >>> for the full time it runs i.e. no IO contention and can't use multiple >>> cores/cpus. So again am I doing something fundamentally wrong or is this >>> just the way Couch works and most people don't have a data set like ours so >>> it does not take that long to create views or does Big Couch solve the >>> issue (although it would seem 10 big couch nodes would still take an hour) >>> >>> Looks like you work at Cloudant, so hopefully you might be able to provide >>> some answers based on real world experience? >>> >>> Mike >>> >>> >>> >>> -----Original Message----- >>> From: Robert Newson [mailto:rnewson@apache.org] >>> Sent: 24 May 2012 12:08 >>> To: user@couchdb.apache.org >>> Subject: Re: Am I doing something fundamentally wrong? >>> >>> Or use a list function; >>> >>> http://wiki.apache.org/couchdb/Formatting_with_Show_and_List >>> >>> You can use one with _all_docs and you can POST an array of ids too. >>> >>> http://wiki.apache.org/couchdb/HTTP_view_API >>> >>>> Since 0.9 you can also issue POST requests to views where you can send >>> the following JSON structure in the body: >>>> {"keys": ["key1", "key2", ...]} >>> B. >>> >>> On 24 May 2012 11:58, Mike Kimber wrote: >>>> Looking at Show documentation and running a quick test I don't think >>> this helps as Show has to be referenced by a doc._id or view key. If these >>> aren't provided it returns null. This makes sense as its for generation of >>> a html, XML page/doc etc. >>>> So I'd have to get a list of all doc ID's I want and then call the show >>> function for each and to get a filtered list I need a view. >>>> Mike >>>> >>>> -----Original Message----- >>>> From: Mike Kimber [mailto:mkimber@kana.com] >>>> Sent: 24 May 2012 10:47 >>>> To: user@couchdb.apache.org >>>> Subject: RE: Am I doing something fundamentally wrong? >>>> >>>> Aur�lien, >>>> >>>> Thanks for the response and apologies I didn't get a notification >>> (e-mail) of my original post (or the 2nd one) or your response. When I look >>> at my original post in Google Reader is has "An error occurred while >>> fetching this message, sorry !", so there must be something in the e-mail >>> that the mailing list system does not like. >>>> In response to your original response " I'm a bit puzzled by the fact >>> that your map functions use the document ID". I do this because I load the >>> data into Luciddb and this allows me to join between tables. This is not my >>> end game this is just a compromise due to the time it takes to generate a >>> view and my need to play/discover with the data. >>>> I will look at show to see if It helps, however it does not really >>> answer my original questions and it does not remove the more general issue >>> that view build takes a very long time, it only uses a single CPU and uses >>> a bucket load of space even with compression on (no idea why when it has a >>> lot less data than the original) >>>> Thanks >>>> >>>> Mike >>>> >>>> -----Original Message----- >>>> From: Aur�lien B�nel [mailto:aurelien.benel@utt.fr] >>>> Sent: 24 May 2012 07:40 >>>> To: user@couchdb.apache.org >>>> Subject: Re: Am I doing something fundamentally wrong? >>>> >>>> Hi Mike, >>>> >>>>> Didn't seem to get there first time so having another go >>>> As I wrote in my earlier post, the use of 'map' functions in both of >>> your examples is overkill. >>>> Use 'show' functions instead.They won't require an index to be built. >>>> >>>> >>>> Regards, >>>> >>>> Aur�lien >> >> >> -- >> �The limits of language are the limits of one's world. � - Ludwig von >> Wittgenstein >> >> "Water is fluid, soft and yielding. But water will wear away rock, which is >> rigid and cannot yield. As a rule, whatever is fluid, soft and yielding >> will overcome whatever is rigid and hard. This is another paradox: what is >> soft is strong." - Lao-Tzu