couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enda Farrell (JIRA)" <>
Subject [jira] Commented: (COUCHDB-220) Extreme sparseness in couch files
Date Mon, 06 Apr 2009 08:38:12 GMT


Enda Farrell commented on COUCHDB-220:

I have been trying out the operational behaviour of the 0.8x release and noticed something
similar to the original posting.

The filesystem type is ext3, but the scenario is different in that there were no attachments
involved. When 1.5 million 9k docs are added *in a random fashion* the .couch file ended up
at 110 GB. After compaction, this reduced to a more expected 14GB. 

A similar test will be run again soon using the 0.9x release.


* in a random fashion is to mean that the key within the single database is a Perl random
number. 4 writers were populating the DB, and some key collissions were expected.

> Extreme sparseness in couch files
> ---------------------------------
>                 Key: COUCHDB-220
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, attachment_sparseness.js
> When adding ten thousand documents, each with a small attachment, the discrepancy between
reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude
more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed
by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message