couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <>
Subject Re: Shards per node
Date Sat, 07 Jan 2017 18:25:59 GMT
Hi Stefan,

there are all great questions :)

> On 27 Dec 2016, at 20:21, Stefan du Fresne <> wrote:
> Hi,
> I’ve been looking into CouchDB 2’s clustering works, and have a question about performance
characteristics of sharding.
> As I understand it, when you first configure a cluster you pick the number of shards.
Each node has to have at least one shard on it, and each shard can be duplicated N times,
where N best practice is considered 2-3x. How many shards you have is decided on DB creation
time, and if you need more later (because maxnodes = shards * replicas) you need to replicate
into a new cluster.

Default N 8 (your N here is q in CouchDB/Dynamo parlance:

> I’m wondering if:
> - There is a recommended number of shards to use, or a recommended range to stay in

It varies a bit on the low end. E.g. I have a low-perf dual core server with spinning disks.
And I’m running this with q=2, as I’m not getting any more IOPS out of the drives and
CPU time from the dual cores for most of my operations. Also, this cluster will never have
to grow, it is a single node q=2 setup.

On the high end, I’ll have to leave this to the Cloudant folks, as they have the most experience
in there. With a modern SSD-backed server, I’d do at least one shard per CPU core, if not
2x or even 3x depending on amount of views.

> - If there is any known performance characteristics that map to how many shards you have.
e.g., how differently would a one node “cluster” perform with 2 shards compared to 16
or even 256. Is there any harm in configuring your cluster with 16 shards say, even if you
aren’t planning on having 16-32 nodes any time soon.

This sadly currently doesn’t exist, but if somebody is up for writing a little performance
benchmark suite that we can run against a range of clusters from q=1 to q=256, I’d love
to see it :)

> While replicating into a larger cluster is safe, we have a lot (thousands) of PouchDB
clients who would have to re-download the entire changes feed to continue replicating. As
these clients are on spotty / slow / expensive connections, so ideally we’d be able to get
the number of shards right first time.

Good point. It’d be great if we had a better story for this. Maybe some clever hackery with
host names and the cluster UUID can work? cc @rnewson @wohali @davisp @chewbranca

Professional Support for Apache CouchDB:

View raw message