Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 64871 invoked from network); 18 Aug 2010 12:12:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Aug 2010 12:12:42 -0000 Received: (qmail 5401 invoked by uid 500); 18 Aug 2010 12:12:41 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 4931 invoked by uid 500); 18 Aug 2010 12:12:37 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 4923 invoked by uid 99); 18 Aug 2010 12:12:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 12:12:36 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of i.wootten@gmail.com designates 209.85.216.180 as permitted sender) Received: from [209.85.216.180] (HELO mail-qy0-f180.google.com) (209.85.216.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 12:12:30 +0000 Received: by qyk29 with SMTP id 29so535254qyk.11 for ; Wed, 18 Aug 2010 05:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=grO+XAE3hkSJPq7/3w46BdVLmbqtSZpUxkyvwCwYnsU=; b=RcND1sq5It5najSWPtb0BnJmCWLDH2Afbg/W6BiVziBNwihtaw/r2M8Iowlm4H8E/D 0+Mz/cubGf9C8LCDhuMmKQeXlF1fttrzuK6ZzW3EO0KZm+mzETCB6/lqrlytfnPlswcU JdkJj36Q9ay1BPSQnhFQv+6snCvPBs5OrLvi0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=bqZSXXHEXeTaXwD9Gtf+Sv9//2FToajysLqBmn+ZYBErIKgQZEMZRDftqRctFogGQJ 3z/ZS3u4Fmq4KVNf7YrfNuCW/nQjUi/nA5k4K14ESd+D1x7tfvTpHW0E64B2fQmaK9DA aQqlZJCTJ9UC+Tzf2Z5pWvfRgJ9BQBIlpU6mo= MIME-Version: 1.0 Received: by 10.229.38.145 with SMTP id b17mr124493qce.136.1282133527476; Wed, 18 Aug 2010 05:12:07 -0700 (PDT) Received: by 10.229.245.17 with HTTP; Wed, 18 Aug 2010 05:12:07 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Aug 2010 13:12:07 +0100 Message-ID: Subject: Re: Struggling with a particular Map / Reduce From: Ian Wootten To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Mark. I've actually ended up using a combination of Martin, Robert's and your approach in the end. Seems to be giving me everything I need now. On 17 August 2010 18:45, Mark J. Reed wrote: > Bear in mind that this approach doesn't distinguish between multiple > instances of a given title within a single doc vs multiple docs. =A0From > your original email I thought you wanted to collapse multiples of the > same title within one document but count separately multiples if they > come from different docs. > > If that's the case, you'll still want to make the list unique before > emitting it... but a double loop isn't the way to go there; something > like this will work: > > map: function(doc) { > =A0var titles=3D{}; > =A0for (var i=3D0; i =A0 =A0 titles[doc.titles[i]] =3D 1 > =A0} > =A0for (var title in titles) { > =A0 =A0 emit(doc.author, title) > =A0} > } > > > On Tue, Aug 17, 2010 at 8:17 AM, Ian Wootten wrote: >> Thanks guys. I'd been working toward a solution with multiple level >> keys but had missed this approach for some reason. It's nice to know >> that at least some part of it has to be implemented in code. >> >> Not fully understanding what was being received by the reduce function >> and how it could be worked upon was the source of my problems. >> >> Anyway, I can get what I require from my view now, thanks for the help. >> >> On 17 August 2010 11:37, Robert Newson wrote: >>> If you emit([doc.docAuthor, doc.titles[title]], 1) instead you could >>> use the built-in Erlang reduce function "_sum" instead, which is >>> faster. >>> >>> B. >>> >>> On Tue, Aug 17, 2010 at 10:24 AM, Martin Higham w= rote: >>>> I think it would be better to use the View to split the titles and cre= ate a >>>> list of Authors and Titles. A Map function such as >>>> >>>> function(doc) { >>>> =A0for (title in doc.titles) >>>> =A0 =A0 =A0emit([doc.docAuthor, doc.titles[title]], null); >>>> } >>>> >>>> does just this. >>>> >>>> You now have a list of keys in the form [Author, title] and they are s= orted >>>> by Author. >>>> >>>> It's easy to then take these and produce a list of unique Author/title >>>> combinations and a count of their frequency with the Reduce function. >>>> >>>> function(keys, values, rereduce) { >>>> =A0if (rereduce) { >>>> =A0 =A0return sum(values); >>>> =A0} >>>> =A0else { >>>> =A0 =A0return values.length; >>>> =A0} >>>> } >>>> >>>> However it is difficult for reduce to produce a list of the top 3. Any >>>> processing within the Reduce can only operate on the data passed in. I= t >>>> doesn't know what data is yet to come. If you were to output only the = top 3 >>>> entries passed in to a given invocation of the Reduce you would produc= e >>>> inaccurate results as you would potentially throw away rows that might= yet >>>> accumulate into the all time top 3. >>>> >>>> Once you have a list of unique Author/title pairs and their frequency = you >>>> can either sort and filter them within the client or within a list fun= ction >>>> >>>> Hope this helps >>>> >>>> Martin >>>> >>>> >>>> On 17 August 2010 09:26, Ian Wootten wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> I was hoping somebody might be able to solve a problem I'm having >>>>> attempting to implement a view at the moment. >>>>> >>>>> Essentially, what it does is to take a collection of documents which >>>>> each have a single author and a list of names (which a possibly >>>>> repeated). There may be multiple documents by the same author, with >>>>> the same names within. Here's an example doc. >>>>> >>>>> doc.author >>>>> doc.titles =3D ['sometitle', 'someothertitle', 'sometitle, 'anotherti= tle'] >>>>> >>>>> I would like to return a list of the top 3 titles across for each >>>>> author across all documents. I have tried and failed for several days >>>>> to get this working correctly. >>>>> >>>>> So far, my map is as follows, giving the unique titles for a document= , >>>>> not ordered at all: >>>>> >>>>> function(doc) { >>>>> >>>>> =A0var unique_titles =3D []; >>>>> >>>>> =A0for(var i in doc.titles) >>>>> =A0{ >>>>> =A0 =A0 var count=3D0; >>>>> >>>>> =A0 =A0 =A0 for(var j in unique_titles) >>>>> =A0 =A0 =A0 { >>>>> =A0 =A0 =A0 =A0 if(doc.titles[i]=3D=3Dunique_titles[j]) >>>>> =A0 =A0 =A0 =A0 { >>>>> =A0 =A0 =A0 =A0 =A0 =A0count++; >>>>> =A0 =A0 =A0 =A0 } >>>>> =A0 =A0 =A0 } >>>>> >>>>> =A0 =A0 =A0 if(count=3D=3D0) >>>>> =A0 =A0 =A0 { >>>>> =A0 =A0 =A0 =A0 unique_titles.push(doc.titles[i]); >>>>> =A0 =A0 =A0 } >>>>> =A0} >>>>> >>>>> =A0for(var k=3D0; k>>>> =A0{ >>>>> =A0 =A0emit(doc.author, unique_titles[k]); >>>>> =A0} >>>>> } >>>>> >>>>> My map is as follows, this returns two unique titles from a single >>>>> document when only a single document exists for an author(I think): >>>>> >>>>> function(keys, values, rereduce) { >>>>> =A0return values.splice(0,2); >>>>> } >>>>> >>>>> My problem is that: >>>>> >>>>> a) I can't return more than 2 items from the values array (if I set >>>>> the splice length to 3, futon spits back a non-reducing error at me). >>>>> b) Where multiple documents exist for the same author, in some >>>>> instances I see a weird multi-dimensional array returned (rather than >>>>> just two values). For example: >>>>> [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']] >>>>> >>>>> Presumably b) is the result of multiple documents for a single author >>>>> interfering with one another, though I'm confused as to how I >>>>> configure my map/reduce in order to get the information I'm after (I >>>>> also wonder if its even possible). >>>>> >>>>> I've attempted to understand the documentation on reduce functions, >>>>> taking a look at the many examples that exist too, but I'm unable to >>>>> understand them well enough to solve my problem. >>>>> >>>>> I'd appreciate any help on this! >>>>> >>>>> Thanks, >>>>> >>>>> Ian >>>>> >>>> >>> >> > > > > -- > Mark J. Reed >