Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 46602 invoked from network); 6 Apr 2009 08:38:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Apr 2009 08:38:36 -0000 Received: (qmail 20059 invoked by uid 500); 6 Apr 2009 08:38:35 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 19960 invoked by uid 500); 6 Apr 2009 08:38:35 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 19950 invoked by uid 99); 6 Apr 2009 08:38:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Apr 2009 08:38:35 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Apr 2009 08:38:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 02DA2234C003 for ; Mon, 6 Apr 2009 01:38:13 -0700 (PDT) Message-ID: <1347514447.1239007092996.JavaMail.jira@brutus> Date: Mon, 6 Apr 2009 01:38:12 -0700 (PDT) From: "Enda Farrell (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-220) Extreme sparseness in couch files In-Reply-To: <1327112634.1232720459507.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696006#action_12696006 ] Enda Farrell commented on COUCHDB-220: -------------------------------------- I have been trying out the operational behaviour of the 0.8x release and noticed something similar to the original posting. The filesystem type is ext3, but the scenario is different in that there were no attachments involved. When 1.5 million 9k docs are added *in a random fashion* the .couch file ended up at 110 GB. After compaction, this reduced to a more expected 14GB. A similar test will be run again soon using the 0.9x release. /e * in a random fashion is to mean that the key within the single database is a Perl random number. 4 writers were populating the DB, and some key collissions were expected. > Extreme sparseness in couch files > --------------------------------- > > Key: COUCHDB-220 > URL: https://issues.apache.org/jira/browse/COUCHDB-220 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.9 > Environment: ubuntu 8.10 64-bit, ext3 > Reporter: Robert Newson > Attachments: 220.patch, attachment_sparseness.js > > > When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge; > ls -lh shard0.couch > 698M 2009-01-23 13:42 shard0.couch > du -sh shard0.couch > 57M shard0.couch > On filesystems that do not support write holes, this will cause an order of magnitude more I/O. > I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'. > Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.