Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 98015 invoked from network); 25 Jun 2009 16:54:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jun 2009 16:54:34 -0000 Received: (qmail 65421 invoked by uid 500); 25 Jun 2009 16:54:44 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 65378 invoked by uid 500); 25 Jun 2009 16:54:44 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 65351 invoked by uid 99); 25 Jun 2009 16:54:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 16:54:44 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of dave@interactivemediums.com) Received: from [209.85.218.218] (HELO mail-bw0-f218.google.com) (209.85.218.218) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 16:54:33 +0000 Received: by bwz18 with SMTP id 18so2062420bwz.11 for ; Thu, 25 Jun 2009 09:54:13 -0700 (PDT) Received: by 10.103.6.14 with SMTP id j14mr1685977mui.48.1245948852951; Thu, 25 Jun 2009 09:54:12 -0700 (PDT) Received: from seaweed-2.local ([66.167.155.34]) by mx.google.com with ESMTPS id j10sm12036317muh.45.2009.06.25.09.54.11 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 25 Jun 2009 09:54:12 -0700 (PDT) Message-ID: <4A43ABB2.8020103@interactivemediums.com> Date: Thu, 25 Jun 2009 11:54:10 -0500 From: dave farkas User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209) MIME-Version: 1.0 To: user@couchdb.apache.org Subject: design doc file size Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, The company I work for is attempting to migrate two messaging systems from mysql to couchdb. Couchdb will be used for reporting and searching messages. Once we have the current data loaded, new messages will be added once per day and existing messages will not be updated. I currently have the smaller of the two loaded into couchdb and it has 8M documents for a total file size on disk of 19G. We have created 8 design docs (typically with two views in each). The total size of these are 46G. The second systems is about three times the size of the smaller one, so I'm expecting the couch database size to be about 60G and the total design doc size to be 150G. Unfortunately, the server we were planning to use won't have enough free disk space for our current messages let alone new ones. Are there any ways to compact design document size or best practices on how to reduce the file size for them? Also, is there a way to cancel or stop a view from indexing once it starts? Here is a typical example of our map/reduce functions (the generated file size for this is 7.3G on disc). We're mainly calculating stats by different criteria over time (messages per account per minute, day, month, year, etc): map.js function(doc) { if (doc['couchrest-type'] == 'ArchivedMessage' && doc.accounts && doc.messages) { if (doc.accounts.length > 0) { account_id = doc.accounts[0].account_id; doc.messages.forEach(function(message) { datetime = message.created_at_utc; year = parseInt(datetime.substr(0, 4)); month = parseInt(datetime.substr(5, 2), 10); day = parseInt(datetime.substr(8, 2), 10); hour = parseInt(datetime.substr(11, 2), 10); minute = parseInt(datetime.substr(14, 2), 10); var message_type_count = new Object(); message_type_count[message.message_type] = 1; message_type_count['total'] = 1; emit([account_id, year, month, day, hour, minute], message_type_count); }); } } } reduce.js function(keys, values, rereduce) { var mt_count = new Object(); for (i = 0; i < values.length; i++) { var utc_count = values[i]; for (key in utc_count) { var count = utc_count[key]; if (!mt_count[key]) { mt_count[key] = count; } else { mt_count[key] += count; } } } return mt_count; } Thanks, Dave