couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Cluster Replication batch_size and batch_count Modification
Date Mon, 05 Jun 2017 16:38:20 GMT
Hi Phil,

Here’s the thing to keep in mind about those warning messages: in a healthy cluster, the
internal replication traffic that generates them is really just a redundant cross-check. It
exists to “heal” a cluster member that was down during some write operations. When you
write data into a CouchDB cluster the copies are written to all relevant shard replicas proactively.

If your cluster’s steady-state write load is causing internal cluster replication to fall
behind permanently, that’s problematic. You should tune the cluster replication parameters
to give it more throughput. If the replication is only falling behind during some batch data
load and then catches up later it may be a different story. You may want to keep things configured
as-is.

Does that make sense?

Cheers, Adam

> On Jun 4, 2017, at 11:06 PM, Phil May <phil.may@motorolasolutions.com> wrote:
> 
> I'm writing to check whether modifying replication batch_count and
> batch_size parameters for cluster replication is good idea.
> 
> Some background – our data platform dev team noticed that under heavy write
> load, cluster replication was falling behind. The following warning
> messages started appearing in the logs, and the pending_changes value
> consistently increased while under load.
> 
> [warning] 2017-05-18T20:15:22.320498Z couch-1@couch-1.couchdb <0.316.0>
> -------- mem3_sync shards/a0000000-bfffffff/test.1495137986
> couch-3@couch-3.couchdb
> {pending_changes,474}
> 
> What we saw is described in COUCHDB-3421
> <https://issues.apache.org/jira/browse/COUCHDB-3421>. In addition, CouchDB
> appears to be CPU bound while this is occurring, not I/O bound as would
> seem reasonable to expect for replication.
> 
> When we looked into this, we discovered in the source two values affecting
> replication, batch_size and batch_count. For cluster replication, these
> values are fixed at 100 and 1 respectively, so we made them configurable.
> We tried various values and it seems increasing the batch_size (and to a
> lesser extent) batch_count improves our write performance. As a point of
> reference, with batch_count=50 and batch_size=5000 we can handle about
> double the write throughput with no warnings. We are experimenting with
> other values.
> 
> We wanted to know if adjusting these parameters is a sound approach.
> 
> Thanks!
> 
> - Phil


Mime
View raw message