Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Date: Tue, 11 Mar 2014 21:26:44 +0000 (UTC)
From: "Isaac Z. Schlueter (JIRA)" <jira@apache.org>
To: dev@couchdb.apache.org
Message-ID: <JIRA.12697148.1393347179975.46957.1394573204169@arcas>
In-Reply-To: <JIRA.12697148.1393347179975@arcas>
References: <JIRA.12697148.1393347179975@arcas>
Subject: [jira] [Commented] (COUCHDB-2102) Downstream replicator database
 bloat
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/COUCHDB-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930975#comment-13930975 ] 

Isaac Z. Schlueter commented on COUCHDB-2102:
---------------------------------------------

Also, it's very strange that attachments would be the cause of this, since the skimdb exhibits this behavior as well, even on an initial replication (which would be attachment-free).

> Downstream replicator database bloat
> ------------------------------------
>
>                 Key: COUCHDB-2102
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2102
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Replication
>            Reporter: Isaac Z. Schlueter
>
> When I do continuous replication from one db to another, I get a lot of bloat over time.
> For example, replicating a _users db with a relatively low level of writes, and around 30,000 documents, the size on disk of the downstream replica was over 300MB after 2 weeks.  I compacted the DB, and the size dropped to about 20MB (slightly smaller than the source database).
> Of course, I realize that I can configure compaction to happen regularly.  But this still seems like a rather excessive tax.  It is especially shocking to users who are replicating a 100GB database full of attachments, and find it grow to 400GB if they're not careful!  You can easily end up in a situation where you don't have enough disk space to successfully compact.
> Is there a fundamental reason why this happens?  Or has it simply never been a priority?  It'd be awesome if replication were more efficient with disk space.


--
This message was sent by Atlassian JIRA
(v6.2#6252)