incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Miller <m...@cloudant.com>
Subject Re: querying a view/reduce
Date Wed, 29 Dec 2010 19:12:33 GMT
Hi Warner,

Ok, that's useful.  We've seen this request often at Cloudant and I think that the best approach
is to think back to what you actually want to display in your front-end.  If the goal is to
give rankings on a yearly, monthly, weekly, and by-day categories, then I would likely use
a handful of indices to accomplish that.    With chainable mapreduce (Cloudant/bigcouch "dbcopy"),
then you would use a map/reduce => dbcopy => map to get sorted lists of rankings in
the pre-binned granularity that you desire.   Let's consider the case of doing a weekly ranking.
 I would do:

1) map: emit([year, month, week, player-id], completion).  

2) reduce: _sum.  

3) dbcopy.  This will capture the group_level=true information as documents in a db that can
be mapreduce consumed.

4) map: emit([year,month,week, summed_score], player-id).  This will give you an index sorted
by high-score, binned on time.  We're essentially swapping the last entry in the compound
key for the summed value from the reduce.

Then at query time you may need to do two queries (if you straddle a monthly boundary) and
a quick sum/sort on the leaderboard lists. 

This is such a common pattern that perhaps we should just turn it into a bigcouch built-in
and do it all erlang side.

-Mike



On Dec 29, 2010, at 10:50 AM, Warner Onstine wrote:

> On Wed, Dec 29, 2010 at 11:40 AM, Michael Miller <mike@cloudant.com> wrote:
>> Ah, sorry, I may have flipped the question you are actually asking.  Let's back up.
Are you asking:
>> 
>> 1) For a given user, over a given date range, give me the values for each day in
that range?
>> 
>> 2) For a given user, over a given date range, give me a single number, the sum over
the entire range?
>> 
>> Or do you want to do the above, but keyed first by date and 2nd by user.
> 
> Sort of #2. What I'm attempting to do is get a ranking of player's
> scores for a given date range so that I can rank them. So, I need to
> query by date and I don't care about that order. What I do care about
> is the sum of a player's completions by player id. Hopefully that's
> clearer.
> 
> I'm not sure if that means that I do multiple emits, or what I need to
> do exactly. Thanks for working through this with me, gradually
> learning how views (map/reduce/re-reduce) work in couch.
> 
> -warner
> 
>> 
>> -Mike
>> 
>> 
>> On Dec 29, 2010, at 9:13 AM, Warner Onstine wrote:
>> 
>>> I'm pretty sure that's what I'm doing already in my search. So, not
>>> clear on how it's any different.
>>> 
>>> Here's one of my query strings for an example (that returns these results):
>>> _design/playerassignmentcompletion/_view/topAssignmentsByDate?startkey=%5B2010%2C11%2C1%5D&endkey=%5B2010%2C11%2C29%5D&group=true
>>> 
>>> which translates to
>>> year = 2010
>>> month = dec (or 11)
>>> day = 1 .. today's date
>>> 
>>> I am not querying on the playerid, but I need to sum up by the player
>>> id (the last column).
>>> 
>>> -warner
>>> 
>>> On Wed, Dec 29, 2010 at 10:01 AM, Michael Miller <mike@cloudant.com> wrote:
>>>> Not sure if you got a reply yet, but if you have keys like:
>>>> 
>>>>> [2010, 11, 28, "fb45ffc66ad8f4f299f0bd06a3000903"]    10 <--- player
1
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3001c32"]    35 <--- player
2
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000ccf"]    55 <--- player
3
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000903"]    75 <--- player
1 (again)
>>>> 
>>>> Can't you just aggregate over users for a given date range via (bash escaping):
>>>> 
>>>> GET '...?startkey=\[2010,11,28\]&endkey=\[2010,11,24\]&group_level=3'
>>>> 
>>>> Also, you'll get better performance by swapping your reduce function for
the simple builtin '_sum'.  That is, just literally replace your js code with '_sum'.
>>>> 
>>>> -M
>>>> 
>>>> On Dec 29, 2010, at 7:10 AM, Warner Onstine wrote:
>>>> 
>>>>> Hi all, I've been trying to figure out the best way to implement this
>>>>> particular view/reduce and have been having some issues.
>>>>> 
>>>>> Let me explain the structure a bit and what I've done.
>>>>> 
>>>>> I have a set of documents for storing a player's assignment completion
>>>>> which contains a point value. I want to query for a given date range
>>>>> for those completions and aggregate the points values. Here is the
>>>>> view and the reduce I came up with :
>>>>> 
>>>>> map:
>>>>> function(doc) {
>>>>>    if (doc.type == 'playerassignmentcompletion' && doc.approved
== true) {
>>>>>        var completionDate = new Date(doc.completionDate);
>>>>>        var completionYear = completionDate.getFullYear();
>>>>>        var completionMonth = completionDate.getMonth();
>>>>>        var completionDay = completionDate.getDate();
>>>>>        emit([
>>>>>            completionYear,
>>>>>            completionMonth,
>>>>>               completionDay,
>>>>>            doc.playerId
>>>>>        ], doc.pointsEarned);
>>>>>    }
>>>>> }
>>>>> 
>>>>> reduce:
>>>>> function(keys, values, rereduce) {
>>>>>    return sum(values);
>>>>> }
>>>>> 
>>>>> And then I say group=true
>>>>> 
>>>>> I didn't realize my issue at the time because all of the completions
I
>>>>> had were for a given date. Now, I added a completion for a different
>>>>> date and what happens is this kind of result:
>>>>> [2010, 11, 28, "fb45ffc66ad8f4f299f0bd06a3000903"]    10 <--- player
1
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3001c32"]    35 <--- player
2
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000ccf"]    55 <--- player
3
>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000903"]    75 <--- player
1 (again)
>>>>> 
>>>>> What I want is to combine all the player's total scores, but with my
>>>>> group=true, it is matching each of my keys exactly. I understand why
>>>>> it's doing it, just not how to fix my key/query to work the way I want
>>>>> it to.
>>>>> 
>>>>> One thought I had was to move the id column up to the front so that I
>>>>> could do group_level=1. But I don't know how to specify a wildcard
>>>>> parameter so that I get all the documents with any key and just the
>>>>> particular date range I want.
>>>>> 
>>>>> Hopefully this is clear enough. I can share more information if it's
>>>>> necessary (just don't want to bog the list down with too many
>>>>> attachments).
>>>>> 
>>>>> Thanks, it is greatly appreciated!
>>>>> 
>>>>> -warner
>>>> 
>>>> 
>> 
>> 


Mime
View raw message