Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 64661 invoked from network); 31 Oct 2008 15:14:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Oct 2008 15:14:23 -0000 Received: (qmail 69731 invoked by uid 500); 31 Oct 2008 15:14:27 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 69710 invoked by uid 500); 31 Oct 2008 15:14:27 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 69698 invoked by uid 99); 31 Oct 2008 15:14:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 08:14:26 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 64.233.182.191 as permitted sender) Received: from [64.233.182.191] (HELO nf-out-0910.google.com) (64.233.182.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 15:13:08 +0000 Received: by nf-out-0910.google.com with SMTP id c7so722616nfi.40 for ; Fri, 31 Oct 2008 08:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=Ae0pdxGk5onj09Yy7aY4PhpqQp9HymX0pEcMLTtssNE=; b=YmxKvk2BdyF8wdP105nD9p24RvdPFAangKoAcpP6oxpBkaoJqTUF5n9dEWQQbc0imP jpsipQTZ8rfZBhGO7uzpz5KJIfBVmYwltWYE/s3QUGQMAVc+e/UlewLTmONIKkhQah6E 0558doyGdRj54toROb6yEjYeuI8iT1xI5b5GQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=ZjmjmXpach+jYCg3zeWx4Mca+lQ7bjZnczTqQBOLJRgiAlvYsbQcNffsyIVkALoxZK RQek7ll3TKUHAnISrtgz5QCVUnAbUHH88aghwQXkkEr2Ml65M77EItpDwrTieZLDK1Oz ZcNIU9+3CwxjznlBX+NUU/1sm4fXTgkb9wuIo= Received: by 10.210.61.8 with SMTP id j8mr6914459eba.45.1225465627895; Fri, 31 Oct 2008 08:07:07 -0700 (PDT) Received: by 10.210.54.17 with HTTP; Fri, 31 Oct 2008 08:07:07 -0700 (PDT) Message-ID: Date: Fri, 31 Oct 2008 08:07:07 -0700 From: "Chris Anderson" Sender: jchris@gmail.com To: couchdb-user@incubator.apache.org Subject: Re: indexes and a waste of a good map reduce In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: X-Google-Sender-Auth: 487af98964ea7422 X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Oct 30, 2008 at 6:35 PM, Ben Nevile wrote: > > So let's say we've implemented a word count. We have a nice view that is > indexed by word, so we can query any word and find out how many times it > appears. But if we want to know what words are the most frequent, seems to > me that currently we're out of luck. Ben, You're right that sorting on reduce values is not part of the current feature set. Here is how I've done that work in the past: First, define just a map view, such that you'd like the reduce to be performed on rows which have the same key. Then use the key_reduce function from this code (or write your own) http://github.com/jchris/couchrest/tree/master/lib/couchrest/helper/pager.rb The idea is that this code pages through the view, yielding each key and all of the values that are associated with it. You could do whatever you like with this data. I define a "reduce" function in ruby, and save it's output as documents in another database. Eg if you're data is in my-db, then key_reduce into my-db-reduce. Then you can define another set of map (and/or reduce) views on my-db-reduce, which will sort the keys by a reduce value. There are some missing features here. Chiefly this code is not all that documented, and I'm certain parts of it could use more convention and less ad-hoc decision making on the part of the user. But the really big feature here would be incremental reduce. It's just a matter of bookkeeping really, but it's not yet implemented. Perhaps next time I have a novel use for key_reduce I'll get it working incrementally. Hope this helps. -- Chris Anderson http://jchris.mfdz.com