couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Browning <>
Subject Re: Partitioned Clusters
Date Fri, 20 Feb 2009 02:39:54 GMT
Overall the model sounds very similar to what I was thinking. I just
have a few comments.

> In this model documents are saved to a leaf node depending on a hash
> of the docid. This means that lookups are easy, and need only to touch
> the leaf node which holds the doc. Redundancy can be provided by
> maintaining R replicas of every leaf node.

There are several use-cases where a true hash of the docid won't be the
optimal partitioning key. The simple case is where you want to partition
your data by user and in most non-trivial cases you won't be storing
all of a user's data under one document with the user's id as the docid.
A fairly simple solution would be allowing the developer to specify a javascript
function somewhere (not sure where this should live...) that takes a docid and
spits out a partition key. Then I could just prefix all my doc ids for
a specific user
with that user's id and write the appropriate partition function.

> View queries, on the other hand, must be handled by every node. The
> requests are proxied down the tree to leaf nodes, which respond
> normally. Each proxy node then runs a merge sort algorithm (which can
> sort in constant space proportional to # of input streams) on the view
> results. This can happen recursively if the tree is deep.

If the developer has control over partition keys as suggested above, it's
entirely possible to have applications where view queries only need data
from one partition. It would be great if we could do something smart here or
have a way for the developer to indicate to Couch that all the data should
be on only one partition.

These are just nice-to-have features and the described cluster setup could
still be extremely useful without them.

The tree setup sounds interesting but I wonder how it would compare in
latency to a flat setup with the same number of leaf nodes. As long as the
developer can control the tree structure (# of children per parent) then this
concern shouldn't be an issue.

- Ben

View raw message