couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: How many nodes can couchdb scale to?
Date Mon, 28 Feb 2011 14:22:56 GMT
On Feb 28, 2011, at 3:39 AM, niall el-assaad wrote:

> Hi Isaac,
> 
> Thanks for the reply. I've put some more info inline:
> 
>> 
>> 
>> * Replication topology: is the plan to have replication from the branch
>> office
>> nodes to your centralized data center? (n:1)
>> 
> 
> We would have a single box at each branch office replicating with two boxes
> at the data centre (for resilience).
> The data centre boxes would then replicate with each other.
> 
> So if some data was inserted at the remote branch, it would be replicated to
> the data centre, and the data centre would replicate it to the other 1999
> branches.
> 
> 
>> * Replication type: continuous or triggered manually/programatically?
>> 
> 
> The ideal would be continuous.
> 
> 
>> * Scope of data set: I would be more concerned with writes than reads.
>> You'll need to have an idea of what your current aggregate average and
>> peak writes per second are, how much data is written for a given
>> period of time, and how far you think you will need this rate to scale in
>> the future.
>> 
> 
> I would expect the average to be around 10 writes per second, with the peak
> at about 100 writes per second.
> 
> 
>> * Why Couch: is CouchDB going to be addressing a brand new need, or
>> is it going to replace existing systems for known reasons? If it's the
>> latter, what is it about your current systems that aren't meeting your
>> demands, and what do you hope Couch will provide that will fill the gap?
>> (Specifically looking for performance data that you might have already
>> collected, and if Couch is going to be living on your existing hardware
>> or new hardware.)
>> 
> 
> Its for a completely new project, the main driver for looking at CouchDB is
> the ability to have a very large scale cluster with write capabilities in
> each branch. Mainly so if their is a failure to communications between the
> branch and the data centre everything can continue to work, then sync up
> later.
> 
>> 
>> I haven't dealt with large distributed Couch systems, but my instinct
>> would be that Couch wouldn't have any problem with a 2000:1 replicated
>> system. (See Ubuntu One as an example of a large CouchDB system with
>> many external replicators.) The ability to handle it would come down
>> to how well the aggregate data set matches the size of hardware and
>> replication layout in your data center, and of course available
>> ingress bandwidth.
>> 
> 
> Understand, we would scale the hardware and bandwidth accordingly based upon
> testing of the application.
> 
> thanks,
> 
> niall

Hi Niall, I think the key part is that with this topology your central servers are going to
need to support a sustained throughput of 20,000 reads/second in order to distribute the updates
to all 2,000 servers.  Granted, each read is repeated 2,000 times, so you'll mostly be reading
from page cache, but a cached read from CouchDB is not nearly as cheap as reading from e.g.
Varnish.

Adam
Mime
View raw message