couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terin Stock (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-2102) Downstream replicator database bloat
Date Sun, 09 Mar 2014 17:41:43 GMT


Terin Stock commented on COUCHDB-2102:

While I also can't give you the database file, being over 150 GB in space, I can share the

1. Rather standard compile of CouchDB 1.5
2. local.ini configuration as such:

public_fields = appdotnet, avatar, avatarMedium, avatarLarge, date, email, fields, freenode,
fullname, github, homepage, name, roles, twitter, type, _id, _rev
users_db_public = true

secure_rewrites = false

delayed_commits = false

3. Setup replication with

  curl -X POST http://localhost:5984/_replicator \
    -d '{"_id":"fullfatdb","source":"","target":"registry","continuous":true,"user_ctx":{"name":"admin","roles":["_admin"]}}'
    -H "Content-Type: application/json"

> Downstream replicator database bloat
> ------------------------------------
>                 Key: COUCHDB-2102
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Replication
>            Reporter: Isaac Z. Schlueter
> When I do continuous replication from one db to another, I get a lot of bloat over time.
> For example, replicating a _users db with a relatively low level of writes, and around
30,000 documents, the size on disk of the downstream replica was over 300MB after 2 weeks.
 I compacted the DB, and the size dropped to about 20MB (slightly smaller than the source
> Of course, I realize that I can configure compaction to happen regularly.  But this still
seems like a rather excessive tax.  It is especially shocking to users who are replicating
a 100GB database full of attachments, and find it grow to 400GB if they're not careful!  You
can easily end up in a situation where you don't have enough disk space to successfully compact.
> Is there a fundamental reason why this happens?  Or has it simply never been a priority?
 It'd be awesome if replication were more efficient with disk space.

This message was sent by Atlassian JIRA

View raw message