Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 28540 invoked from network); 15 May 2009 01:26:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 May 2009 01:26:03 -0000 Received: (qmail 79975 invoked by uid 500); 15 May 2009 01:26:03 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 79901 invoked by uid 500); 15 May 2009 01:26:03 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 79891 invoked by uid 99); 15 May 2009 01:26:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2009 01:26:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skippy.hammond@gmail.com designates 209.85.222.107 as permitted sender) Received: from [209.85.222.107] (HELO mail-pz0-f107.google.com) (209.85.222.107) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2009 01:25:52 +0000 Received: by pzk5 with SMTP id 5so834214pzk.13 for ; Thu, 14 May 2009 18:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=4qAYiBVaHmQ37yMvuGnlsRvdnNjLUzdepodK4UfHsQY=; b=Op4BKFSt+KllZlkPNGayALuDjJOyh+C7e42hpv2IebcFywAJZvQjVLHwNf9LRCX7S+ 4o/m4A+3j2uc7cqlxoDP42s5e7BEgOx6+o2D/29TinIPjt4B7Cwfo4NzkR4iEpjweJYW 3CkzG5Fm1wMcuNcT725bhwT74Kj+/VUHxScBs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=bzx86H6bPyq98VCeN81prBceTfHMwd6EGVxUfanAbxznrI4jVdp+6EIGvAi0btkfBe m4Eeb7SCAr0am+BvxgWSciNg0DOWL/yrjY6/rxjCtsSNQq3EczbuvF269dZYGqiNCkQf 8m5wNjkXLZf7kc8ecg9/EMXm1S7iNfSisEHc0= Received: by 10.114.124.1 with SMTP id w1mr3080906wac.132.1242350730895; Thu, 14 May 2009 18:25:30 -0700 (PDT) Received: from ?192.168.0.12? (202.168.100.57.dynamic.rev.eftel.com [202.168.100.57]) by mx.google.com with ESMTPS id q20sm2213374pog.6.2009.05.14.18.25.28 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 14 May 2009 18:25:30 -0700 (PDT) Message-ID: <4A0CC46D.3030009@gmail.com> Date: Fri, 15 May 2009 11:25:01 +1000 From: Mark Hammond Reply-To: mhammond@skippinet.com.au User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2 MIME-Version: 1.0 To: dev@couchdb.apache.org CC: Zachary Zolton , B.Candler@pobox.com Subject: Re: View Filter References: <20090514074329.GB6258@uk.tiscali.com> <20090514184701.GA18955@uk.tiscali.com> In-Reply-To: <20090514184701.GA18955@uk.tiscali.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 15/05/2009 4:47 AM, Brian Candler wrote: > On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote: >> (1) people who are storing large documents in CouchDB but not indexing them >> at all (I guess this is possible, e.g. if the doc ids are well-known or >> stored in other documents, but this isn't the most common way of working) > > The proposal would exclude a document from *all* views in a particular > design doc. So you're only going to get a benefit from this if you have a > large number of documents (or a number of large documents) which are not > required to be indexed in any view in that design doc. Yep - and that is the point. Consider Jan's example, where it was filtering on doc['type']. If a database had (say) 10 potential values of 'type', then all filters that only care about a single type will only care about 1 in 10 of those documents. Taking this to its extreme, we tested Jan's patch on a view which matches very few document in a large database. Rebuilding that view with a filter was 18 times faster than without the filter. We put this down to the fact the filter managed to avoid the json encode/decode step for the vast majority of the docs in the database. IOW, on my test database, 6 minutes is spent before the filters can actually do anything (ie, that is just the json processing), whereas using the filter to avoid that json step brings it down to 20 seconds. So while not everyone will be able to see such significant speedups, many may find it extremely useful. > And it's reasonable, given that (as I understand it) each document is > already only passed once to the view server, in order to be indexed by all > the views in that design document. I agree there is lots that can and should be done to speed up views that do indeed care about most of the docs - such views spend less time relatively in the json encode step and more time in the interpreter. As an experiment, I "ported" one of our views that does look at most of the docs from javascript to erlangview, and the performance increase was far more modest (20% maybe). I suspect the javascript interpreter is faster than erlang, so I suspect that there will be a level of view complexity where using javascript *increases* view performance over erlang, even when factoring in the json processing... Cheers, Mark