couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <>
Subject Re: Partitioned Clusters
Date Fri, 20 Feb 2009 19:55:20 GMT

On Feb 20, 2009, at 1:55 PM, Stefan Karpinski wrote:

> Hi, I thought I'd introduce myself since I'm new here on the couchdb
> list. I'm Stefan Karpinski. I've worked in the Monitoring Group at
> Akamai, Operations R&D at Citrix Online, and I'm nearly done with a
> PhD in computer networking at the moment. So I guess I've thought
> about this kind of stuff a bit ;-)
> I'm curious what the motivation behind a tree topology is. Not that
> it's not a viable approach, just why that and not a load-balancer in
> front of a bunch of "leaves" with lateral propagation between the
> leaves? Why should the load-balancing/proxying/caching node even be
> running couchdb?
> One reason I can see for a tree topology would be the hierarchical
> cache effect. But that would likely only make sense in certain
> circumstances. Being able to configure the topology to meet various
> needs, rather than enforcing one particular topology makes more sense
> to me overall.

Trees would be overkill except for with very large clusters.

With CouchDB map views, you need to combine results from every node in  
a big merge sort. If you combine all results at a single node, the  
single clients ability to simultaneously pull data and sort data from  
all other nodes may become the bottleneck. So to parallelize, you have  
multiple nodes doing a merge sort of sub nodes , then sending those  
results to another node to be combined further, etc.  The same with  
with the reduce views, but instead of a merge sort it's just  
rereducing results. The natural "shape" of that computation is a tree,  
with only the final root node at the top being the bottleneck, but now  
it has to maintain connections and merge the sort values from far  
fewer nodes.


View raw message