incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Ferguson <ke...@meebo-inc.com>
Subject RE: Size of view file
Date Mon, 26 Oct 2009 20:29:06 GMT
For #1, have you considered a view like:

MAP:
function(doc) {
         datetime = doc.created_at;
         year = parseInt(datetime.substr(0, 4));
         month = parseInt(datetime.substr(5, 2), 10);
         day = parseInt(datetime.substr(8, 2), 10);
         emit([year, month, day, doc.user_agent], val );
       }
REDUCE:
function(k,v,r) { return sum(v); }

Then you can query with startkey=[y,m,d], endkey=[y,m,d,{}], group=true and get the count
for each user-agent on that day.  I think the output will be smaller too, but I don't know
a whole lot about the view engine internals.

Kevin

________________________________________
From: Ryan Richins [richinsr@mac.com]
Sent: Monday, October 26, 2009 1:24 PM
To: user@couchdb.apache.org
Subject: Size of view file

I am working on a project where i have 12k documents and the size of
the db is 11MB but the view file is is over 4GB.  Obviously I am doing
something wrong with my views to make the file so large.  I was hoping
to get some input as to where my problem might be.

Running couchdb 0.90

Each document has 3 attributes one of which is 'User Agent'.  For each
attribute I have the following views defined
"by_<attribute>_total_date" and "by_<attribute> _created_at".  Below
is the code for the 2 views that deal with User Agent.  The same code
is used to define the views for the other 2 attributes except
doc.user_agent is replace by doc.<attribute>

My guess is the problem lies somewhere in the
"by_<attribute>_total_date" since every other view I have returns NULL
for the value.

#1 by_ua_total_date
-----------------
MAP:
function(doc) {
         var val = {};
         datetime = doc.created_at;
         year = parseInt(datetime.substr(0, 4));
         month = parseInt(datetime.substr(5, 2), 10);
         day = parseInt(datetime.substr(8, 2), 10);
         val[doc.user_agent] = 1;
         emit([year, month, day], val );
       }

REDUCE:
function (keys, values, rereduce) {
         var rv = {};
         for (i in values) {
           var value = values[i];
           for (k in value) {
             rv[k] = (rv[k] || 0) + value[k];
           }
         }
         return rv;
       }


EXAMPLE OUTPUT (Key, Value)
[2009, 9, 6], {Mozilla/5.0 (iPod; U; CPU iPhone OS 2_2_1 like Mac OS
X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1
Mobile/5H11a Safari/525.20: 5, Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; FunWebProducts; InfoPath.2; .NET CLR 2.0.50727;
OfficeLiveConnector.1.3; OfficeLivePatch.0.0): 2, Mozilla/5.0 (iPod;
U; CPU iPhone OS 2_2 like Mac OS X; en-us) AppleWebKit/525.18.1
(KHTML, like Gecko) Version/3.1.1 Mobile/5G77a Safari/525.20: 2,
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; YPC
3.2.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
3.0.04506.30; .NET CLR 3.0.04506.648; InfoPath.2): 1 }
-----------------

#2 by_ua_created_at
----------------
MAP:
function(doc) {
     emit([doc['user_agent'], doc['created_at']], null);
}
EXAMPLE OUTPUT (Key, Value)
["8900a/1.2 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile
7.6)", "2009/10/11 13:02:46 +0000"], NULL
----------------



Going through Fulton to view my data, it does not seem it should be
4GB worth but I am missing something.  Any insight would be very much
appreciated.

Thanks,

Ryan




Mime
View raw message