couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Partitioned Clusters
Date Fri, 20 Feb 2009 19:55:20 GMT

On Feb 20, 2009, at 1:55 PM, Stefan Karpinski wrote:

> Hi, I thought I'd introduce myself since I'm new here on the couchdb
> list. I'm Stefan Karpinski. I've worked in the Monitoring Group at
> Akamai, Operations R&D at Citrix Online, and I'm nearly done with a
> PhD in computer networking at the moment. So I guess I've thought
> about this kind of stuff a bit ;-)
>
> I'm curious what the motivation behind a tree topology is. Not that
> it's not a viable approach, just why that and not a load-balancer in
> front of a bunch of "leaves" with lateral propagation between the
> leaves? Why should the load-balancing/proxying/caching node even be
> running couchdb?
>
> One reason I can see for a tree topology would be the hierarchical
> cache effect. But that would likely only make sense in certain
> circumstances. Being able to configure the topology to meet various
> needs, rather than enforcing one particular topology makes more sense
> to me overall.

Trees would be overkill except for with very large clusters.

With CouchDB map views, you need to combine results from every node in  
a big merge sort. If you combine all results at a single node, the  
single clients ability to simultaneously pull data and sort data from  
all other nodes may become the bottleneck. So to parallelize, you have  
multiple nodes doing a merge sort of sub nodes , then sending those  
results to another node to be combined further, etc.  The same with  
with the reduce views, but instead of a merge sort it's just  
rereducing results. The natural "shape" of that computation is a tree,  
with only the final root node at the top being the bottleneck, but now  
it has to maintain connections and merge the sort values from far  
fewer nodes.

-Damien


Mime
View raw message