couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warner Onstine <warn...@gmail.com>
Subject Re: querying a view/reduce
Date Thu, 30 Dec 2010 03:19:44 GMT
On Wed, Dec 29, 2010 at 12:12 PM, Michael Miller <mike@cloudant.com> wrote:
> Hi Warner,
>
> Ok, that's useful.  We've seen this request often at Cloudant and I think that the best
approach is to think back to what you actually want to display in your front-end.  If the
goal is to give rankings on a yearly, monthly, weekly, and by-day categories, then I would
likely use a handful of indices to accomplish that.    With chainable mapreduce (Cloudant/bigcouch
"dbcopy"), then you would use a map/reduce => dbcopy => map to get sorted lists of rankings
in the pre-binned granularity that you desire.   Let's consider the case of doing a weekly
ranking.  I would do:
>
> 1) map: emit([year, month, week, player-id], completion).
>
> 2) reduce: _sum.
>
> 3) dbcopy.  This will capture the group_level=true information as documents in a db
that can be mapreduce consumed.
>
> 4) map: emit([year,month,week, summed_score], player-id).  This will give you an index
sorted by high-score, binned on time.  We're essentially swapping the last entry in the compound
key for the summed value from the reduce.
>
> Then at query time you may need to do two queries (if you straddle a monthly boundary)
and a quick sum/sort on the leaderboard lists.

Since what I'm doing is monthly rankings there won't be a need to do
this (well, as far as I know now, it may change in the future).

So, is dbcopy something built-in somewhere? I did a google and didn't
turn up too much on it. Any help in how to implement this would be
greatly appreciated as I'm still learning the ins and outs of views in
couchdb-land.

-warner

>
> This is such a common pattern that perhaps we should just turn it into a bigcouch built-in
and do it all erlang side.
>
> -Mike
>
>
>
> On Dec 29, 2010, at 10:50 AM, Warner Onstine wrote:
>
>> On Wed, Dec 29, 2010 at 11:40 AM, Michael Miller <mike@cloudant.com> wrote:
>>> Ah, sorry, I may have flipped the question you are actually asking.  Let's back
up. Are you asking:
>>>
>>> 1) For a given user, over a given date range, give me the values for each day
in that range?
>>>
>>> 2) For a given user, over a given date range, give me a single number, the sum
over the entire range?
>>>
>>> Or do you want to do the above, but keyed first by date and 2nd by user.
>>
>> Sort of #2. What I'm attempting to do is get a ranking of player's
>> scores for a given date range so that I can rank them. So, I need to
>> query by date and I don't care about that order. What I do care about
>> is the sum of a player's completions by player id. Hopefully that's
>> clearer.
>>
>> I'm not sure if that means that I do multiple emits, or what I need to
>> do exactly. Thanks for working through this with me, gradually
>> learning how views (map/reduce/re-reduce) work in couch.
>>
>> -warner
>>
>>>
>>> -Mike
>>>
>>>
>>> On Dec 29, 2010, at 9:13 AM, Warner Onstine wrote:
>>>
>>>> I'm pretty sure that's what I'm doing already in my search. So, not
>>>> clear on how it's any different.
>>>>
>>>> Here's one of my query strings for an example (that returns these results):
>>>> _design/playerassignmentcompletion/_view/topAssignmentsByDate?startkey=%5B2010%2C11%2C1%5D&endkey=%5B2010%2C11%2C29%5D&group=true
>>>>
>>>> which translates to
>>>> year = 2010
>>>> month = dec (or 11)
>>>> day = 1 .. today's date
>>>>
>>>> I am not querying on the playerid, but I need to sum up by the player
>>>> id (the last column).
>>>>
>>>> -warner
>>>>
>>>> On Wed, Dec 29, 2010 at 10:01 AM, Michael Miller <mike@cloudant.com>
wrote:
>>>>> Not sure if you got a reply yet, but if you have keys like:
>>>>>
>>>>>> [2010, 11, 28, "fb45ffc66ad8f4f299f0bd06a3000903"]    10 <---
player 1
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3001c32"]    35 <---
player 2
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000ccf"]    55 <---
player 3
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000903"]    75 <---
player 1 (again)
>>>>>
>>>>> Can't you just aggregate over users for a given date range via (bash
escaping):
>>>>>
>>>>> GET '...?startkey=\[2010,11,28\]&endkey=\[2010,11,24\]&group_level=3'
>>>>>
>>>>> Also, you'll get better performance by swapping your reduce function
for the simple builtin '_sum'.  That is, just literally replace your js code with '_sum'.
>>>>>
>>>>> -M
>>>>>
>>>>> On Dec 29, 2010, at 7:10 AM, Warner Onstine wrote:
>>>>>
>>>>>> Hi all, I've been trying to figure out the best way to implement
this
>>>>>> particular view/reduce and have been having some issues.
>>>>>>
>>>>>> Let me explain the structure a bit and what I've done.
>>>>>>
>>>>>> I have a set of documents for storing a player's assignment completion
>>>>>> which contains a point value. I want to query for a given date range
>>>>>> for those completions and aggregate the points values. Here is the
>>>>>> view and the reduce I came up with :
>>>>>>
>>>>>> map:
>>>>>> function(doc) {
>>>>>>    if (doc.type == 'playerassignmentcompletion' && doc.approved
== true) {
>>>>>>        var completionDate = new Date(doc.completionDate);
>>>>>>        var completionYear = completionDate.getFullYear();
>>>>>>        var completionMonth = completionDate.getMonth();
>>>>>>        var completionDay = completionDate.getDate();
>>>>>>        emit([
>>>>>>            completionYear,
>>>>>>            completionMonth,
>>>>>>               completionDay,
>>>>>>            doc.playerId
>>>>>>        ], doc.pointsEarned);
>>>>>>    }
>>>>>> }
>>>>>>
>>>>>> reduce:
>>>>>> function(keys, values, rereduce) {
>>>>>>    return sum(values);
>>>>>> }
>>>>>>
>>>>>> And then I say group=true
>>>>>>
>>>>>> I didn't realize my issue at the time because all of the completions
I
>>>>>> had were for a given date. Now, I added a completion for a different
>>>>>> date and what happens is this kind of result:
>>>>>> [2010, 11, 28, "fb45ffc66ad8f4f299f0bd06a3000903"]    10 <---
player 1
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3001c32"]    35 <---
player 2
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000ccf"]    55 <---
player 3
>>>>>> [2010, 11, 26, "fb45ffc66ad8f4f299f0bd06a3000903"]    75 <---
player 1 (again)
>>>>>>
>>>>>> What I want is to combine all the player's total scores, but with
my
>>>>>> group=true, it is matching each of my keys exactly. I understand
why
>>>>>> it's doing it, just not how to fix my key/query to work the way I
want
>>>>>> it to.
>>>>>>
>>>>>> One thought I had was to move the id column up to the front so that
I
>>>>>> could do group_level=1. But I don't know how to specify a wildcard
>>>>>> parameter so that I get all the documents with any key and just the
>>>>>> particular date range I want.
>>>>>>
>>>>>> Hopefully this is clear enough. I can share more information if it's
>>>>>> necessary (just don't want to bog the list down with too many
>>>>>> attachments).
>>>>>>
>>>>>> Thanks, it is greatly appreciated!
>>>>>>
>>>>>> -warner
>>>>>
>>>>>
>>>
>>>
>
>

Mime
View raw message