couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torstein Krause Johansen <>
Subject Re: Complex queries & results
Date Mon, 06 Jun 2011 12:12:22 GMT
Hi Benjamin,

and thanks for your comments.

On 31/05/11 22:11, Benjamin Young wrote:
> On 5/27/11 5:16 AM, Torstein Krause Johansen wrote:

>>> ?group=true&group_level=2&startkey=["2011-05-26"]&endkey=["2011-05-27",
>>> {}]
>>> results in:
>>> {
>>> "key": ["2011-05-26", "Lisa"],
>>> "value": 1
>>> },
>>> {
>>> "key": ["2011-05-26", "John"],
>>> "value": 2
>>> },
>>> {
>>> "key": ["2011-05-27", "John"],
>>> "value": 1
>>> }
>>> You can of course emit not just days, but also weeks, months,
>>> quarters if that's what you always want. If it arbitrary and you need
>>> to aggregate the names afterwards from this smaller set, yo should do
>>> it in the client (whoever calls CouchDB to get this information out).
>> Mhmm, ok, thanks for explaining this.
>> It means though, that for every unique time stamp that a_name has an
>> entry, there will be a corresponding count returned (like the three
>> you listed above).
>> Hence, if a_name has 1000 entries at slightly different times within
>> the time range I'm searching for (my created_at includes seconds), I
>> will get 1000 such entries back.
> It really just depends on what you want to count/reduce/etc. If you only
> want a count of the names (and don't want additional
> granularity--name+year counts) then just return the name as the index.
> If you want the count of names by year/month/day, etc, then return those
> *after* the name, so you can add specificity by incrementing your
> group_level param.

There's probably, something I haven't understood here. If I add my 
search fields after a_name, then how can I limit my search on start and 
endkey when a_name cannot be included in the start and end keys (since 
the name is what I want to count on)?

Just to be sure, I want to re-state what I want: I have documents with 
the following fields:

     one_id : 1,
     another_id : 22,
     created_at : "2011-05-26 23:22:11",
     a_name : "Lisa"

I want to be able to search all occurrences with a combination of the 
three first ones as query parameters and then count the number of a_name 
occurrences within each of these search collections.

There will be many entries like the one above (say 30.000), where the 
only difference is the created_at field. Searching for these variable 

     created_at > "2011-05-26 23:30:00"
     created_at < "2011-05-27 01:00:00"

I want to end up with a dictionary listing the names and their count 
matching the search parameters:

    "Lisa" : 132
    "John" : 16

If I put [created_at, one_id, another_id, a_name] in the key, I can use 
the start and end keys :
startkey=["2011-05-26 23:30:00",1,22]&
endkey=["2011-05-27 01:00:00",2,23]

I will get results like these:
   "key": ["2011-05-26 23:30:10", 1, 22, "Lisa"],
   "value": 1
   "key": ["2011-05-26 23:30:12", 1, 22, "Lisa"],
   "value": 3
   "key": ["2011-05-26 23:33:43", 1, 22, "Lisa"],
   "value": 5

Giving me a quite big result set, since there's so many hits where the 
created_at is slightly different.

> Alternatively, if you want to count *just* the names and *just* the
> dates, you'll need two indexes ones for names and one for dates as you
> can't "skip" the key groups (as your example tried to do with [{},...].
> Basically, you'll need an additional view/index for each key you're
> wanting to count + whatever output you want to make the counting more
> granular (in this case, date).

Mhmm. So in this case, it means I need an index for one_id, another_id 
and a_name (three ones)? If yes, I'm puzzled as to how I can make use of 
these indexes just with one GET request?


Initially, I got something working for my use case, using two indexes, 
one to get the a_name values based based on the search queries a_value, 
another_value & created_at. Querying the second index, I got the number 
of occurrences for a_name within the hits returned from the first query.

However, this didn't feel optimal (although I've read posts on the 
mailing list of people doing two batches of queries before), so I tried 
to go down a different road, as described above.

Best regards,


View raw message