Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 62481 invoked from network); 17 Aug 2010 09:24:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Aug 2010 09:24:43 -0000 Received: (qmail 85367 invoked by uid 500); 17 Aug 2010 09:24:41 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 85091 invoked by uid 500); 17 Aug 2010 09:24:39 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 85082 invoked by uid 99); 17 Aug 2010 09:24:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Aug 2010 09:24:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of martinh@gmail.com designates 74.125.83.52 as permitted sender) Received: from [74.125.83.52] (HELO mail-gw0-f52.google.com) (74.125.83.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Aug 2010 09:24:32 +0000 Received: by gwj20 with SMTP id 20so3587136gwj.11 for ; Tue, 17 Aug 2010 02:24:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=9lWiupPxqYjE9zlkRC5sC9b0Co4jG+vWKw8EBIzAKZk=; b=s9EBWOfpYoVMs/PBP8n9zX1kptRkbPePmy1GvSijnH1bOYdtLtnMhT6RstbSfcyVWM QtPP0wBKf52sQCvEjyDw5JKiZskTfquStC4mqeocJmowPotZcxEAkFfgxYQwxk/FJ0cL ZjV9W+5CsNZJKUmN7+g8zXEHYOo7WHNNbHpsU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=nIoVgvpyK+kig8jCQ6PJ+20sNwWVcLcIiGyALSaB4lq5wcVNgtBeNJlkZZ2sF6bCyu xZVob+/w1hKB3dc7/ogXsCzVUZLzp55vlITJnwJCC0vFReAmnkIslapi4WGUQqMvHYSb JlMjbOJdiUzAl/ihhD1myX72SCmQvw38kOZXo= MIME-Version: 1.0 Received: by 10.150.176.7 with SMTP id y7mr6794521ybe.200.1282037051539; Tue, 17 Aug 2010 02:24:11 -0700 (PDT) Sender: martinh@gmail.com Received: by 10.231.147.14 with HTTP; Tue, 17 Aug 2010 02:24:11 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Aug 2010 10:24:11 +0100 X-Google-Sender-Auth: TqzMI3azmyP8KpDrq4Ro5tsOYFk Message-ID: Subject: Re: Struggling with a particular Map / Reduce From: Martin Higham To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=000e0cdf1bfa0bc0fd048e0183a3 --000e0cdf1bfa0bc0fd048e0183a3 Content-Type: text/plain; charset=ISO-8859-1 I think it would be better to use the View to split the titles and create a list of Authors and Titles. A Map function such as function(doc) { for (title in doc.titles) emit([doc.docAuthor, doc.titles[title]], null); } does just this. You now have a list of keys in the form [Author, title] and they are sorted by Author. It's easy to then take these and produce a list of unique Author/title combinations and a count of their frequency with the Reduce function. function(keys, values, rereduce) { if (rereduce) { return sum(values); } else { return values.length; } } However it is difficult for reduce to produce a list of the top 3. Any processing within the Reduce can only operate on the data passed in. It doesn't know what data is yet to come. If you were to output only the top 3 entries passed in to a given invocation of the Reduce you would produce inaccurate results as you would potentially throw away rows that might yet accumulate into the all time top 3. Once you have a list of unique Author/title pairs and their frequency you can either sort and filter them within the client or within a list function Hope this helps Martin On 17 August 2010 09:26, Ian Wootten wrote: > Hi Everyone, > > I was hoping somebody might be able to solve a problem I'm having > attempting to implement a view at the moment. > > Essentially, what it does is to take a collection of documents which > each have a single author and a list of names (which a possibly > repeated). There may be multiple documents by the same author, with > the same names within. Here's an example doc. > > doc.author > doc.titles = ['sometitle', 'someothertitle', 'sometitle, 'anothertitle'] > > I would like to return a list of the top 3 titles across for each > author across all documents. I have tried and failed for several days > to get this working correctly. > > So far, my map is as follows, giving the unique titles for a document, > not ordered at all: > > function(doc) { > > var unique_titles = []; > > for(var i in doc.titles) > { > var count=0; > > for(var j in unique_titles) > { > if(doc.titles[i]==unique_titles[j]) > { > count++; > } > } > > if(count==0) > { > unique_titles.push(doc.titles[i]); > } > } > > for(var k=0; k { > emit(doc.author, unique_titles[k]); > } > } > > My map is as follows, this returns two unique titles from a single > document when only a single document exists for an author(I think): > > function(keys, values, rereduce) { > return values.splice(0,2); > } > > My problem is that: > > a) I can't return more than 2 items from the values array (if I set > the splice length to 3, futon spits back a non-reducing error at me). > b) Where multiple documents exist for the same author, in some > instances I see a weird multi-dimensional array returned (rather than > just two values). For example: > [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']] > > Presumably b) is the result of multiple documents for a single author > interfering with one another, though I'm confused as to how I > configure my map/reduce in order to get the information I'm after (I > also wonder if its even possible). > > I've attempted to understand the documentation on reduce functions, > taking a look at the many examples that exist too, but I'm unable to > understand them well enough to solve my problem. > > I'd appreciate any help on this! > > Thanks, > > Ian > --000e0cdf1bfa0bc0fd048e0183a3--