Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: unknown (nike.apache.org: error in processing during lookup of
 dave@interactivemediums.com)
Message-ID: <4A43ABB2.8020103@interactivemediums.com>
Date: Thu, 25 Jun 2009 11:54:10 -0500
From: dave farkas <dave@interactivemediums.com>
User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209)
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: design doc file size
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

The company I work for is attempting to migrate two messaging systems 
from mysql to couchdb. Couchdb will be used for reporting and searching 
messages. Once we have the current data loaded, new messages will be 
added once per day and existing messages will not be updated.

I currently have the smaller of the two loaded into couchdb and it has 
8M documents for a total file size on disk of 19G. We have created 8 
design docs (typically with two views in each). The total size of these 
are 46G. The second systems is about three times the size of the smaller 
one, so I'm expecting the couch database size to be about 60G and the 
total design doc size to be 150G. Unfortunately, the server we were 
planning to use won't have enough free disk space for our current 
messages let alone new ones. Are there any ways to compact design 
document size or best practices on how to reduce the file size for them? 
Also, is there a way to cancel or stop a view from indexing once it starts?

Here is a typical example of our map/reduce functions (the generated 
file size for this is 7.3G on disc). We're mainly calculating stats by 
different criteria over time (messages per account per minute, day, 
month, year, etc):

map.js

function(doc) {
 if (doc['couchrest-type'] == 'ArchivedMessage' && doc.accounts && 
doc.messages) {
  if (doc.accounts.length > 0) {
    account_id = doc.accounts[0].account_id;
    doc.messages.forEach(function(message) {
      datetime = message.created_at_utc;
      year = parseInt(datetime.substr(0, 4));
      month = parseInt(datetime.substr(5, 2), 10);
      day = parseInt(datetime.substr(8, 2), 10);
      hour = parseInt(datetime.substr(11, 2), 10);
      minute = parseInt(datetime.substr(14, 2), 10);
      var message_type_count = new Object();
      message_type_count[message.message_type] = 1;
      message_type_count['total'] = 1;
      emit([account_id, year, month, day, hour, minute], 
message_type_count);
    });
  }
 }
}

reduce.js

function(keys, values, rereduce) {
  var mt_count = new Object();
  for (i = 0; i < values.length; i++) {
      var utc_count = values[i];
      for (key in utc_count) {
          var count = utc_count[key];
          if (!mt_count[key]) {
              mt_count[key] = count;
          } else {
              mt_count[key] += count;
          }
      }
  }
  return mt_count;
}

Thanks,
Dave