couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: Cluster Replication batch_size and batch_count Modification
Date Mon, 05 Jun 2017 23:05:17 GMT
The answer to your clarifying question is absolutely yes. The “pending_changes” metric
refers to the number of committed changes on the shard replica emitting the log event that
need to be cross-checked on another replica. It’s not a measure of writes that need to be

Cheers, Adam

> On Jun 5, 2017, at 4:37 PM, Phil May <> wrote:
> Hi Adam,
> Thanks for the info!
> When we run at high write rates, we will start to fall behind, but when we
> reduce the rate, we eventually catch up.
> I have a clarification question – can the warning messages we are seeing
> still occur in a healthy cluster due to the "redundant cross-check" taking
> long enough that more changes have accumulated that now also need to be
> cross-checked (even when no actual writes were needed)?
> We have had some luck modifying sync_concurrency (which is exposed in the
> .ini file) and batch_size (which we exposed), and that does give us more
> throughput capacity.
> Thanks!
> - Phil
> On Mon, Jun 5, 2017 at 11:38 AM, Adam Kocoloski <> wrote:
>> Hi Phil,
>> Here’s the thing to keep in mind about those warning messages: in a
>> healthy cluster, the internal replication traffic that generates them is
>> really just a redundant cross-check. It exists to “heal” a cluster member
>> that was down during some write operations. When you write data into a
>> CouchDB cluster the copies are written to all relevant shard replicas
>> proactively.
>> If your cluster’s steady-state write load is causing internal cluster
>> replication to fall behind permanently, that’s problematic. You should tune
>> the cluster replication parameters to give it more throughput. If the
>> replication is only falling behind during some batch data load and then
>> catches up later it may be a different story. You may want to keep things
>> configured as-is.
>> Does that make sense?
>> Cheers, Adam
>>> On Jun 4, 2017, at 11:06 PM, Phil May <>
>> wrote:
>>> I'm writing to check whether modifying replication batch_count and
>>> batch_size parameters for cluster replication is good idea.
>>> Some background – our data platform dev team noticed that under heavy
>> write
>>> load, cluster replication was falling behind. The following warning
>>> messages started appearing in the logs, and the pending_changes value
>>> consistently increased while under load.
>>> [warning] 2017-05-18T20:15:22.320498Z couch-1@couch-1.couchdb <0.316.0>
>>> -------- mem3_sync shards/a0000000-bfffffff/test.1495137986
>>> couch-3@couch-3.couchdb
>>> {pending_changes,474}
>>> What we saw is described in COUCHDB-3421
>>> <>. In addition,
>> CouchDB
>>> appears to be CPU bound while this is occurring, not I/O bound as would
>>> seem reasonable to expect for replication.
>>> When we looked into this, we discovered in the source two values
>> affecting
>>> replication, batch_size and batch_count. For cluster replication, these
>>> values are fixed at 100 and 1 respectively, so we made them configurable.
>>> We tried various values and it seems increasing the batch_size (and to a
>>> lesser extent) batch_count improves our write performance. As a point of
>>> reference, with batch_count=50 and batch_size=5000 we can handle about
>>> double the write throughput with no warnings. We are experimenting with
>>> other values.
>>> We wanted to know if adjusting these parameters is a sound approach.
>>> Thanks!
>>> - Phil

View raw message