incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Wall <jw...@google.com>
Subject Re: Help with complex key range query and map/reduce
Date Mon, 28 Sep 2009 14:21:07 GMT
Typing from my phone so I can't give a code example. But from what you are
describing I think you want a key that looks something like this.
[CategoryId, Date, domain]. You can then use reduce to return counts by
category, date, and domain. If you want counts that span categories and
dates though then you will have to merge them in your application code.

Sent from my G1 google phone

On Sep 28, 2009 12:42 AM, "Glenn Rempe" <glenn@rempe.us> wrote:

Hello,  I am hoping the group can provide some guidance on building a
map/reduce summary for a set of documents that I have retrieved with a
complex key range query.

In summary, I want to query a range of docs by a complex key consisting of
an [id, year, month, day, engine].  It is common for me to need to query
over a range of dates with something along the lines of this (CouchRest Ruby
code):

  sd = SearchDocument.by_ad_domain_histogram(:startkey => [3, 2009, 9, 1,
"g"], :endkey => [3, 2009, 9, 28, "g"], :include_docs => false, :reduce =>
true)

In this example, I want all SearchDocuments that are for category '3'
between 9/1/2009 and 9/28/2009, and for engine 'g'.

Now, this is working well and returning a range of docs.  So the tricky part
(at least for me as a CouchDB n00b) is that each of these docs also has an
'ad_domain' value.  And what I want is to generate a simple histogram which
groups all of these range of docs by the ad_domains found.  So I want to get
back at the end of the day is something like:

{'foo.com' => 12, 'bar.com' => 32, 'baz.com' => 14}

This is what I am stuck on as I have not wrapped my head around map/reduce
in CouchDB quite yet.  I think this would be easier if I emitted the
ad_domain at the end of the complex key if ALL of the parts of the key that
I was querying on were the same.  But since I am querying on a range of keys
I end up with a reduce spitting out multiple entries for 'foo.com' since
that domain was found across several days (unique complex keys) of results.

Is it possible to do what I want?

Here is a work in progress map/reduce, which doesn't do what I want yet.
 Help in modifying it would be much appreciated (and tips for doing what I
am already doing, only better!):

 view_by :ad_domain_histogram,
     :map =>
       'function(doc) {
         if( (doc["couchrest-type"] == "SearchDocument") && doc.category_id
&& doc.searched_at && doc.engine) {
           var domain;
           if(doc.ad_domain == null){
             domain="NONE";
           }else{
             domain=doc.ad_domain;
           }
           emit([doc.category_id,
                 new Date(Date.parse(doc.searched_at)).getFullYear(),
                 new Date(Date.parse(doc.searched_at)).getMonth() + 1,
                 new Date(Date.parse(doc.searched_at)).getDate(),
                 doc.engine],
                 domain);
         }
       }',
       :reduce =>
         "function(keys, values, rereduce) {
           return sum(values);
         }"

Thanks very much in advance,

Glenn

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message