couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Karpinski <>
Subject Re: Partitioned Clusters
Date Fri, 20 Feb 2009 18:55:54 GMT
Hi, I thought I'd introduce myself since I'm new here on the couchdb
list. I'm Stefan Karpinski. I've worked in the Monitoring Group at
Akamai, Operations R&D at Citrix Online, and I'm nearly done with a
PhD in computer networking at the moment. So I guess I've thought
about this kind of stuff a bit ;-)

I'm curious what the motivation behind a tree topology is. Not that
it's not a viable approach, just why that and not a load-balancer in
front of a bunch of "leaves" with lateral propagation between the
leaves? Why should the load-balancing/proxying/caching node even be
running couchdb?

One reason I can see for a tree topology would be the hierarchical
cache effect. But that would likely only make sense in certain
circumstances. Being able to configure the topology to meet various
needs, rather than enforcing one particular topology makes more sense
to me overall.

On 2/20/09, Robert Newson <> wrote:
> Any thoughts as to how (or even if) this tree-wise result aggregation
> would work for externals?
> I'm thinking specifically about couchdb-lucene, where multi-node
> results aggregation is possible, given a framework like you propose
> here. The results that couchdb-lucene produces can already be
> aggregated, assuming there's a hook for the merge function (actually,
> perhaps it's exactly reduce-shaped)...
> B.
> On Fri, Feb 20, 2009 at 3:12 AM, Chris Anderson <> wrote:
>> On Thu, Feb 19, 2009 at 6:39 PM, Ben Browning <> wrote:
>>> Overall the model sounds very similar to what I was thinking. I just
>>> have a few comments.
>>>> In this model documents are saved to a leaf node depending on a hash
>>>> of the docid. This means that lookups are easy, and need only to touch
>>>> the leaf node which holds the doc. Redundancy can be provided by
>>>> maintaining R replicas of every leaf node.
>>> There are several use-cases where a true hash of the docid won't be the
>>> optimal partitioning key. The simple case is where you want to partition
>>> your data by user and in most non-trivial cases you won't be storing
>>> all of a user's data under one document with the user's id as the docid.
>>> A fairly simple solution would be allowing the developer to specify a
>>> javascript
>>> function somewhere (not sure where this should live...) that takes a
>>> docid and
>>> spits out a partition key. Then I could just prefix all my doc ids for
>>> a specific user
>>> with that user's id and write the appropriate partition function.
>>>> View queries, on the other hand, must be handled by every node. The
>>>> requests are proxied down the tree to leaf nodes, which respond
>>>> normally. Each proxy node then runs a merge sort algorithm (which can
>>>> sort in constant space proportional to # of input streams) on the view
>>>> results. This can happen recursively if the tree is deep.
>>> If the developer has control over partition keys as suggested above, it's
>>> entirely possible to have applications where view queries only need data
>>> from one partition. It would be great if we could do something smart here
>>> or
>>> have a way for the developer to indicate to Couch that all the data
>>> should
>>> be on only one partition.
>>> These are just nice-to-have features and the described cluster setup
>>> could
>>> still be extremely useful without them.
>> I think they are both sensible optimizations. Damien's described the
>> JS partition function before on IRC, so I think it fits into the
>> model. As far as restricting view queries to just those docs within a
>> particular id range, it might make sense to partition by giving each
>> user their own database, rather than logic on the docid. In the case
>> where you need data in a single db, but still have some queries that
>> can be partitioned, its still a good optimization. Luckily even in the
>> unoptimized case, if a node has no rows to contribute to the final
>> view result than it should have a low impact on total resources needed
>> to generate the result.
>> Chris
>> --
>> Chris Anderson

View raw message