Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: <48EA6CCB-267D-4A64-A43E-5703A0BDCF1A@apache.org>
References: <CADMeck0ak1nb5QRDr+MeYHDcveh-uJ0ba4EiD515rGE3BChBZQ@mail.gmail.com>
 <48EA6CCB-267D-4A64-A43E-5703A0BDCF1A@apache.org>
From: Phil May <phil.may@motorolasolutions.com>
Date: Mon, 5 Jun 2017 15:37:08 -0500
Message-ID: <CADMeck1JTfdy+N_1qhbO2HKS+XV=T-ZyG_jSqesVS2=vax5Abg@mail.gmail.com>
Subject: Re: Cluster Replication batch_size and batch_count Modification
To: dev@couchdb.apache.org
Content-Type: multipart/alternative; boundary="001a1141c70656c94f05513c796c"
archived-at: Mon, 05 Jun 2017 20:40:44 -0000

--001a1141c70656c94f05513c796c
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Adam,

Thanks for the info!

When we run at high write rates, we will start to fall behind, but when we
reduce the rate, we eventually catch up.

I have a clarification question =E2=80=93 can the warning messages we are s=
eeing
still occur in a healthy cluster due to the "redundant cross-check" taking
long enough that more changes have accumulated that now also need to be
cross-checked (even when no actual writes were needed)?

We have had some luck modifying sync_concurrency (which is exposed in the
.ini file) and batch_size (which we exposed), and that does give us more
throughput capacity.

Thanks!

- Phil


On Mon, Jun 5, 2017 at 11:38 AM, Adam Kocoloski <kocolosk@apache.org> wrote=
:

> Hi Phil,
>
> Here=E2=80=99s the thing to keep in mind about those warning messages: in=
 a
> healthy cluster, the internal replication traffic that generates them is
> really just a redundant cross-check. It exists to =E2=80=9Cheal=E2=80=9D =
a cluster member
> that was down during some write operations. When you write data into a
> CouchDB cluster the copies are written to all relevant shard replicas
> proactively.
>
> If your cluster=E2=80=99s steady-state write load is causing internal clu=
ster
> replication to fall behind permanently, that=E2=80=99s problematic. You s=
hould tune
> the cluster replication parameters to give it more throughput. If the
> replication is only falling behind during some batch data load and then
> catches up later it may be a different story. You may want to keep things
> configured as-is.
>
> Does that make sense?
>
> Cheers, Adam
>
> > On Jun 4, 2017, at 11:06 PM, Phil May <phil.may@motorolasolutions.com>
> wrote:
> >
> > I'm writing to check whether modifying replication batch_count and
> > batch_size parameters for cluster replication is good idea.
> >
> > Some background =E2=80=93 our data platform dev team noticed that under=
 heavy
> write
> > load, cluster replication was falling behind. The following warning
> > messages started appearing in the logs, and the pending_changes value
> > consistently increased while under load.
> >
> > [warning] 2017-05-18T20:15:22.320498Z couch-1@couch-1.couchdb <0.316.0>
> > -------- mem3_sync shards/a0000000-bfffffff/test.1495137986
> > couch-3@couch-3.couchdb
> > {pending_changes,474}
> >
> > What we saw is described in COUCHDB-3421
> > <https://issues.apache.org/jira/browse/COUCHDB-3421>. In addition,
> CouchDB
> > appears to be CPU bound while this is occurring, not I/O bound as would
> > seem reasonable to expect for replication.
> >
> > When we looked into this, we discovered in the source two values
> affecting
> > replication, batch_size and batch_count. For cluster replication, these
> > values are fixed at 100 and 1 respectively, so we made them configurabl=
e.
> > We tried various values and it seems increasing the batch_size (and to =
a
> > lesser extent) batch_count improves our write performance. As a point o=
f
> > reference, with batch_count=3D50 and batch_size=3D5000 we can handle ab=
out
> > double the write throughput with no warnings. We are experimenting with
> > other values.
> >
> > We wanted to know if adjusting these parameters is a sound approach.
> >
> > Thanks!
> >
> > - Phil
>
>

--001a1141c70656c94f05513c796c--