Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 12109 invoked from network); 19 Apr 2010 14:10:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Apr 2010 14:10:34 -0000 Received: (qmail 78039 invoked by uid 500); 19 Apr 2010 14:10:33 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 77991 invoked by uid 500); 19 Apr 2010 14:10:33 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 77983 invoked by uid 99); 19 Apr 2010 14:10:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Apr 2010 14:10:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [91.189.90.139] (HELO adelie.canonical.com) (91.189.90.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Apr 2010 14:10:25 +0000 Received: from hutte.canonical.com ([91.189.90.181]) by adelie.canonical.com with esmtp (Exim 4.69 #1 (Debian)) id 1O3rfp-0002BC-1r for ; Mon, 19 Apr 2010 15:10:05 +0100 Received: from c-68-34-42-190.hsd1.md.comcast.net ([68.34.42.190] helo=[192.168.1.4]) by hutte.canonical.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1O3rfo-0003bG-Sq for user@couchdb.apache.org; Mon, 19 Apr 2010 15:10:05 +0100 Message-ID: <4BCC643B.4030208@canonical.com> Date: Mon, 19 Apr 2010 10:10:03 -0400 From: Eric Casteleijn User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100415 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: getting most recent doc References: <4BC9CF46.6070006@canonical.com> <85340AE9-EBD2-440B-8538-37676DEA624B@apache.org> In-Reply-To: <85340AE9-EBD2-440B-8538-37676DEA624B@apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 04/19/2010 09:41 AM, Adam Kocoloski wrote: > On Apr 17, 2010, at 11:09 AM, Eric Casteleijn wrote: > >> On 04/16/2010 04:46 AM, wolfgang haefelinger wrote: >>> Thanks Robert >>> >>> for your answer. However, it is not exactly what I was looking for >>> (due to my inappropriate problem description). >>> >>> Firstly, I do want to have the document instead of the time stamp in >>> order to avoid that additional document fetch. That's obviously easy >>> to fix: >>> >>> function(doc) { // >>> emit([doc.name, doc.timestamp], doc); >>> } >> >> Don't do that, it's unnecessary, because you can always call any view with '?include_docs=true' and it will add a 'doc' member to each row, containing the document, and worse than that, it's harmful, as it makes the indexes stored on disk many times larger than they need to be. (Depending on the size of your documents this can really make a huge difference, anecdotal evidence suggests: gwibber used to do this, and when I changed it, the indexes stored on disk decreased some 90% in size.) >> >> If you always want the whole document, just emit null for a value and always call the view with include_docs. >> >> If there are cases where you don't want the whole document, decide which data you need and only emit that. > > Hi Eric, I don't think its correct to have a blanket recommendation to always use include_docs=true. For large range queries on a view the query performance will be much better - up to 10x better throughput on large DBs in my experience - if the doc is already included. Yes, the view index will balloon in size, but some people may be willing to make that tradeoff. Cheers, Oops, thanks for catching that Adam, and my apologies, that was rather myopic. I didn't think about the other side of the tradeoff, but that makes a lot of sense. I still wonder in that case if there is something you can do to shrink the stored views somewhat: gwibber had a number of views that emitted the whole document, but those documents (typically representing a twitter or identi.ca message) weren't very large in themselves. My database, after compaction was something between 70 and 80 MB, whereas the indexes took over a GB. Since gwibber+desktopcouch run on the desktop, where only one client typically talks to couch, I still think we made the right decision to sacrifice speed for diskspace. On a server, both are important though, considering we host multiple couchdbs per user. Luckily we don't compute the views for the gwibber dbs server side, but I'm sure it's something we'll run into again elsewhere. -- eric casteleijn https://code.launchpad.net/~thisfred Canonical Ltd.