incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Copenhaver <sean.copenha...@gmail.com>
Subject Re: Complex queries & results
Date Thu, 26 May 2011 11:26:09 GMT
Have you tried having a map function that emits the 'a_name' as a key then a
reduce function that uses the builtin sum(), then you query with group=true?

A little info:
group option:
https://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
look at the last example before 'Enforcing Uniqueness'
http://guide.couchdb.org/draft/cookbook.html#aggregate

On Thu, May 26, 2011 at 3:12 AM, Torstein Krause Johansen <
torsteinkrausework@gmail.com> wrote:

> Hi all,
>
> I have problems solving the following problem with CouchDB and am wondering
> if I'm trying to solve something for which Couch isn't suitable, if there is
> something I have misunderstood or if there's some hidden feature I haven't
> discovered yet.
>
> I have documents with the following fields:
> {
>  one_id : 1,
>  another_id : 22,
>  created_at : "2011-05-26",
>  a_name : "Lisa"
> }
>
> I want to search all occurrences with a combination of the three first ones
> as query parameters and then count the number of a_name occurrences within
> each of these search collections. For this reason, I put this into my
> view/map.js: emit([one_id, another_id, created_at], a_name);
>
> Now, using these keys and start/end key, I get the result rows I want. So
> far so good.
>
> My next step, is that I want to count the number of a_name within each of
> these hits, producing a dictionary like:
> {
>  "John" : 234142,
>  "Dominique" : 21177,
>  "Lisa" : 123
> }
>
> Initially, I tried to do this with a reduce.js, but couldn't work out how
> I'd go about this. The documentation I've read on reduces only mentions
> simple (built in) functions for counting and summing up the total rows and
> what I want here are counts based on the values themselves as "keys" in the
> view's result.
>
> I've managed to get working using (exploiting?) lists, but this doesn't
> scale well with 100 000s of rows.
>
> For these reasons, I've resorted to doing two view operations, one to get
> the initial results and one to get the count of each a_name within the first
> result. This works, but doesn't feel optimal. Also, the returned dataset of
> the first search is overwhelming, leading to a ~5-7 second download of the
> data (and putting nginx/gzip infront of Couch didn't improve matters enough
> :-)
>
> The total time it takes to do my two queries adds up to ~6-9 seconds,
> something which is not fast enough for my application and I am therefore
> seeking your guidance.
>
> Cheers,
>
> -Torstein
>
>
>
>
>
>
>


-- 
“The limits of language are the limits of one's world. “ -Ludwig von
Wittgenstein

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message