From user-return-9206-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Mon Mar 08 08:08:21 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 62942 invoked from network); 8 Mar 2010 08:08:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Mar 2010 08:08:21 -0000 Received: (qmail 68869 invoked by uid 500); 8 Mar 2010 08:07:57 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 68610 invoked by uid 500); 8 Mar 2010 08:07:57 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 68602 invoked by uid 99); 8 Mar 2010 08:07:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 08:07:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pawelstawicki@gmail.com designates 74.125.78.25 as permitted sender) Received: from [74.125.78.25] (HELO ey-out-2122.google.com) (74.125.78.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 08:07:50 +0000 Received: by ey-out-2122.google.com with SMTP id 4so781913eyf.41 for ; Mon, 08 Mar 2010 00:07:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=hBUict4pH3ZDr61GlE9jwV2VAIPoNrzQ7LhRB8QjUi0=; b=VlT/9RkDU5J8wFhZLfbAEjfumHAR13Z5tNJWErPFF4MAcGoAjMVHkn5kkH7mNyDo8p v/dWPxRnO6iUC4dgq+SxQ6VwtKk/v+Q33VqnZB/LZG06DC/DdA6ISliGNXL6jqcmVsB3 CnJ/i5gddVvsrvcrb0RFe/dj2jmHBKqyuCgFU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=CdGDycpPMudVrLtY/SeNzyYAj1rdO8iDZY5tuNRULfGJf80xyvVjzzKsRHybv/ZTdN FCkdgqDlsEuz/RpU2sr/VDK7H2knaNSHQ7BveX2tqLiHzKvKfhJhhH2ECB0YQq2OJBXx 0MDmLsdgq35BPJhuVvjPiX0l4VhXlj61tnmDE= MIME-Version: 1.0 Received: by 10.213.109.201 with SMTP id k9mr2709152ebp.95.1268035648104; Mon, 08 Mar 2010 00:07:28 -0800 (PST) In-Reply-To: References: <6adfa88d1003071302x1a7e95b7k1ac303abd4a7a5f2@mail.gmail.com> From: =?UTF-8?Q?Pawe=C5=82_Stawicki?= Date: Mon, 8 Mar 2010 09:07:08 +0100 Message-ID: <6adfa88d1003080007o7f524e7bgcfa8472be969bf25@mail.gmail.com> Subject: Re: Map reduce and weird output question To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=000e0ce0d8485e29450481458e57 X-Virus-Checked: Checked by ClamAV on apache.org --000e0ce0d8485e29450481458e57 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hmm... I'm just thinking now, don't know if it works, but maybe try something like this: If you can get number of documents per day per username, first try to make this number always one if keys is [date, username]: Reduce: if (keys.length =3D=3D 2) { return 1; } else if (keys.length =3D=3D 1) { //date only, return number of usernames return values.length(); } The risk is that some usernames will count twice, but maybe try it. Best regards -- Pawe=C5=82 Stawicki http://pawelstawicki.blogspot.com http://szczecin.jug.pl On Mon, Mar 8, 2010 at 08:03, Gregory Tappero wrote: > My number of keys is 4, year month day userame so returning the bbr of > keys in reduce does not seem to give me the output i am looking for. > Unless i misunderstood something. > > Thank you for helping, > > Greg > > On Mon, Mar 8, 2010 at 12:28 AM, Randall Leeds > wrote: > > I'm not an expert on this, but I think you need to create your own > > reduce function and output the number of keys rather than the sum of > > the values. > > > > On Sun, Mar 7, 2010 at 15:15, Gregory Tappero wrote: > >> Thank you Pawel, > >> > >> If i try to follow your way it gives me the count of docs in a given > >> day for each username, what i would like is the count of unique > >> usernames for a given day. > >> > >> function(doc) { > >> > >> if (doc.doc_type=3D=3D"EdoPing" && doc.em_type=3D=3D0) { > >> date =3D new Date().setRFC3339(doc.created_at); > >> emit([date.getFullYear(), parseInt(date.getMonth())+1, > >> date.getDate(), doc.em_uname] , 1); > >> > >> } > >> } > >> > >> Reduce: > >> _count > >> > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> I get: > >> > >> [2010, 3, 3, "student1"] 5 > >> [2010, 3, 4, "student1"] 18 > >> [2010, 3, 5, "eong"] 77 > >> [2010, 3, 6, "bkante"] 71 > >> [2010, 3, 6, "jfrancillette"] 72 > >> [2010, 3, 6, "mlouviers"] 12 > >> [2010, 3, 7, "student1"] 4 > >> > >> I would like to extract the following > >> > >> [2010, 3, 3] 1 > >> [2010, 3, 4] 1 > >> [2010, 3, 5] 1 > >> [2010, 3, 6] 3 > >> [2010, 3, 7] 1 > >> > >> > >> if i do a group_level=3D3 it sum the values. > >> > >> {"key":[2010,3,3],"value":5}, > >> {"key":[2010,3,4],"value":18}, > >> {"key":[2010,3,5],"value":77}, > >> {"key":[2010,3,6],"value":155}, > >> {"key":[2010,3,7],"value":4} > >> > >> How can i count the unique username emitter per day ? > >> > >> > >> > >> > >> On Sun, Mar 7, 2010 at 10:02 PM, Pawe=C5=82 Stawicki < > pawelstawicki@gmail.com> wrote: > >>> Just emit all documents with em_type =3D 0 in map function, with [dat= e, > >>> em_uname] as key. Then count in reduce. > >>> > >>> Map: > >>> function(doc) { > >>> if (doc.em_type =3D 0) { > >>> //If you only want to count, you can emit anything (e.g. 1) instea= d > of > >>> doc here. > >>> emit([date, em_uname], doc); > >>> } > >>> } > >>> > >>> Reduce: > >>> function(keys, values, rereduce) { > >>> if (!rereduce) { > >>> return count_of_values; > >>> } else { > >>> return sum_of_values; > >>> } > >>> > >>> //If you return 1 from emit instead of doc, then count_of_values =3D= =3D > >>> sum_of_values > >>> } > >>> > >>> Then you can handle everything by grouping: > >>> http://yourserver:5984/yourdb/_view/yourview?group_level=3D2 > >>> or group=3Dtrue > >>> > >>> Regards > >>> -- > >>> Pawe=C5=82 Stawicki > >>> http://pawelstawicki.blogspot.com > >>> http://szczecin.jug.pl > >>> > >>> > >>> > >>> On Sat, Mar 6, 2010 at 16:26, Gregory Tappero > wrote: > >>> > >>>> Hello everyone, > >>>> > >>>> I have the following EdoPing 's type of documents > >>>> > >>>> { > >>>> "_id": "22add509c1e7bc286832edc5bfe99ce5", > >>>> "_rev": "1-49663ab8778f445e481143120d0d7086", > >>>> "doc_type": "EdoPing", > >>>> "em_uname": "student1", > >>>> "em_gid": 1, > >>>> "created_at": "2010-03-03T14:18:19Z", > >>>> "em_ip": "92.154.70.148", > >>>> "em_type": 0, > >>>> "room_url": "z2fudcvcrfa3reaydatre", > >>>> "room_users": [ > >>>> "tutorsbox" > >>>> ] > >>>> } > >>>> > >>>> i would like to count all unique em_uname of em_type 0 on a given da= y > date. > >>>> > >>>> For now i used this map/reduce > >>>> http://friendpaste.com/5xUUQ26bbl9d5KRB8eojwe > >>>> > >>>> Date.prototype.setRFC3339 =3D function(dString){ > >>>> var regexp =3D > >>>> > >>>> > /(\d\d\d\d)(-)?(\d\d)(-)?(\d\d)(T)?(\d\d)(:)?(\d\d)(:)?(\d\d)(\.\d+)?(Z|(= [+-])(\d\d)(:)?(\d\d))/; > >>>> > >>>> if (dString.toString().match(new RegExp(regexp))) { > >>>> var d =3D dString.match(new RegExp(regexp)); > >>>> var offset =3D 0; > >>>> > >>>> this.setUTCDate(1); > >>>> this.setUTCFullYear(parseInt(d[1],10)); > >>>> this.setUTCMonth(parseInt(d[3],10) - 1); > >>>> this.setUTCDate(parseInt(d[5],10)); > >>>> this.setUTCHours(parseInt(d[7],10)); > >>>> this.setUTCMinutes(parseInt(d[9],10)); > >>>> this.setUTCSeconds(parseInt(d[11],10)); > >>>> if (d[12]) > >>>> this.setUTCMilliseconds(parseFloat(d[12]) * 1000); > >>>> else > >>>> this.setUTCMilliseconds(0); > >>>> if (d[13] !=3D 'Z') { > >>>> offset =3D (d[15] * 60) + parseInt(d[17],10); > >>>> offset *=3D ((d[14] =3D=3D '-') ? -1 : 1); > >>>> this.setTime(this.getTime() - offset * 60 * 1000); > >>>> } > >>>> } else { > >>>> this.setTime(Date.parse(dString)); > >>>> } > >>>> return this; > >>>> }; > >>>> > >>>> var seenKeys =3D new Array(); > >>>> > >>>> function(doc) { > >>>> > >>>> > >>>> if (doc.doc_type=3D=3D"EdoPing" && doc.em_type=3D=3D0) { > >>>> date =3D new Date().setRFC3339(doc.created_at); > >>>> var key =3D doc.em_uname + > String(doc.created_at).substring(0,10); > >>>> if (seenKeys[key] =3D=3D undefined ) { > >>>> seenKeys[key] =3D 1; > >>>> emit([date.getFullYear(), parseInt(date.getMonth())+1, > >>>> date.getDate() ] , 1); > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> It works when saved for this first time but as soon as new EdoPings > >>>> get added it starts emitting rows it has already seen ! (same key) > >>>> creating faulty count results. > >>>> > >>>> Is it ok to have seenKeys outside of the doc function() ? > >>>> What other way could i use to get the same results ? > >>>> > >>>> Thanks, > >>>> > >>>> Greg > >>>> > >>> > >> > >> > >> > >> -- > >> Greg Tappero > >> CTO co founder Edoboard > >> http://www.edoboard.com > >> +33 0645764425 > >> > > > > > > -- > Greg Tappero > CTO co founder Edoboard > http://www.edoboard.com > +33 0645764425 > --000e0ce0d8485e29450481458e57--