couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kowsik <>
Subject Re: Scaling CouchDB
Date Fri, 22 Apr 2011 18:47:07 GMT
On Fri, Apr 22, 2011 at 11:40 AM, Jim Klo <> wrote:
> I'm part of the core Federal Learning Registry dev team [],
and we're using CouchDB to store and replicate contents of the registry within our network.
> One of the questions that has come up as we are starting to make plans for our initial
production release is the scalability strategy of CouchDB?  We expect long term, we are going
to have an enormous amount of data from activity streams and metadata inserted into the network,
and I'd like to have an idea what we need to work towards now so theres no big surprise when
we start getting close to hitting some limits.
> As part of our infrastructure strategy - we've chosen Amazon Web Services EC2 & EBS
as our hosting provider for the first rollout.  EBS currently has an upper limit of 1TB per
volume, other cloud or non-cloud solutions may have similar or different limitations, however
I'm only concerned right now with how we might deal with this on EC2 and EBS.

Hopefully not on the Virginia region. :)

> 1. Are there CouchDB limits that we are going to run into before we hit 1TB?
> 2. Is there a strategy to for disk spanning to go beyond the 1TB limit by incorporating
multiple volumes or do we need to leverage a solution like BigCouch which seems to require
us to spin up multiple CouchDB's and do some sort of sharding/partitioning of data?  I'm
curious on how queries that span shards/partitions works or if this is transparent.

This might help:

We are running CouchDB (not BigCouch) clusters across multiple regions
with full master to master replication. We are in the process of also
distributing our apps across regions with DNS fail overs and each
region is going to pick the CouchDB closest to it and let eventual
consistency kick in.

That said, given that is just a month old, we haven't
quite hit the 1TB limit yet.  I suspect as we get close to that point,
sharding will become important with filtered changes only replicating
select "global" data (like site stats and what not) across the
cluster. During the recent EC2 outage though, the CouchDB instances
stayed alive across multiple regions, though our PaaS provider on the
east coast went down. Things are back now!


View raw message