incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Copenhaver <sean.copenha...@gmail.com>
Subject Re: Complex queries & results
Date Mon, 06 Jun 2011 14:09:04 GMT
I would like to just add that CouchDB views represents a single dimensional
index. Index as in the same term in a relational database. A list keys is
like specifying sub-ordering, order by the first, then the second, then
the....

It sounded like at some point you may have been a bit confused as to what a
map function is defining. It is defining the index's key and the value
mapped to that key. This gives an otherwise completely unstructured
assortment of data a view of consistent structure.

Anyway, back to what you are trying to accomplish. Honestly it sounds like
you are trying to get too advanced for the built-in _count or _sum reduce
functions. Have you tried writing a custom reduce function that does the
grouping how you want, basically by name alone?

https://gist.github.com/1010318

I tried this out with 10 docs fitting your example structure and with a
plain query (no grouping, no filtering, reduce on) I get back:

{ John: 4, Jane: 6 }

Maybe this example can get you started. The map function I defined above I
used those keys because it looks like you are interested in filtering on
them. In my query I used for the example results I actually didn't use the
key at all (no filtering).

On Mon, Jun 6, 2011 at 8:12 AM, Torstein Krause Johansen <
torsteinkrausework@gmail.com> wrote:

> Hi Benjamin,
>
> and thanks for your comments.
>
>
> On 31/05/11 22:11, Benjamin Young wrote:
>
>> On 5/27/11 5:16 AM, Torstein Krause Johansen wrote:
>>
>
>  ?group=true&group_level=2&startkey=["2011-05-26"]&endkey=["2011-05-27",
>>>> {}]
>>>>
>>>> results in:
>>>>
>>>> {
>>>> "key": ["2011-05-26", "Lisa"],
>>>> "value": 1
>>>> },
>>>> {
>>>> "key": ["2011-05-26", "John"],
>>>> "value": 2
>>>> },
>>>> {
>>>> "key": ["2011-05-27", "John"],
>>>> "value": 1
>>>> }
>>>>
>>>> You can of course emit not just days, but also weeks, months,
>>>> quarters if that's what you always want. If it arbitrary and you need
>>>> to aggregate the names afterwards from this smaller set, yo should do
>>>> it in the client (whoever calls CouchDB to get this information out).
>>>>
>>>
>>> Mhmm, ok, thanks for explaining this.
>>>
>>> It means though, that for every unique time stamp that a_name has an
>>> entry, there will be a corresponding count returned (like the three
>>> you listed above).
>>>
>>> Hence, if a_name has 1000 entries at slightly different times within
>>> the time range I'm searching for (my created_at includes seconds), I
>>> will get 1000 such entries back.
>>>
>>
>> It really just depends on what you want to count/reduce/etc. If you only
>> want a count of the names (and don't want additional
>> granularity--name+year counts) then just return the name as the index.
>> If you want the count of names by year/month/day, etc, then return those
>> *after* the name, so you can add specificity by incrementing your
>> group_level param.
>>
>
> There's probably, something I haven't understood here. If I add my search
> fields after a_name, then how can I limit my search on start and endkey when
> a_name cannot be included in the start and end keys (since the name is what
> I want to count on)?
>
> Just to be sure, I want to re-state what I want: I have documents with the
> following fields:
>
> {
>    one_id : 1,
>    another_id : 22,
>    created_at : "2011-05-26 23:22:11",
>    a_name : "Lisa"
> }
>
> I want to be able to search all occurrences with a combination of the three
> first ones as query parameters and then count the number of a_name
> occurrences within each of these search collections.
>
> There will be many entries like the one above (say 30.000), where the only
> difference is the created_at field. Searching for these variable parameters:
>
>    one_id=1,
>    another_id=22,
>    created_at > "2011-05-26 23:30:00"
>    created_at < "2011-05-27 01:00:00"
>
> I want to end up with a dictionary listing the names and their count
> matching the search parameters:
>
> {
>   "Lisa" : 132
>   "John" : 16
> }
>
> If I put [created_at, one_id, another_id, a_name] in the key, I can use the
> start and end keys :
> ?group=true&
> group_level=4&
> startkey=["2011-05-26 23:30:00",1,22]&
> endkey=["2011-05-27 01:00:00",2,23]
>
> I will get results like these:
> {
>  "key": ["2011-05-26 23:30:10", 1, 22, "Lisa"],
>  "value": 1
> },
> {
>  "key": ["2011-05-26 23:30:12", 1, 22, "Lisa"],
>  "value": 3
> },
> {
>  "key": ["2011-05-26 23:33:43", 1, 22, "Lisa"],
>  "value": 5
> },
> [..]
>
> Giving me a quite big result set, since there's so many hits where the
> created_at is slightly different.
>
>
>  Alternatively, if you want to count *just* the names and *just* the
>> dates, you'll need two indexes ones for names and one for dates as you
>> can't "skip" the key groups (as your example tried to do with [{},...].
>>
>> Basically, you'll need an additional view/index for each key you're
>> wanting to count + whatever output you want to make the counting more
>> granular (in this case, date).
>>
>
> Mhmm. So in this case, it means I need an index for one_id, another_id and
> a_name (three ones)? If yes, I'm puzzled as to how I can make use of these
> indexes just with one GET request?
>
> [..]
>
> Initially, I got something working for my use case, using two indexes, one
> to get the a_name values based based on the search queries a_value,
> another_value & created_at. Querying the second index, I got the number of
> occurrences for a_name within the hits returned from the first query.
>
> However, this didn't feel optimal (although I've read posts on the mailing
> list of people doing two batches of queries before), so I tried to go down a
> different road, as described above.
>
> Best regards,
>
> -Torstein
>



-- 
“The limits of language are the limits of one's world. “ -Ludwig von
Wittgenstein

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message