couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: distributed map/reduce in BigCouch
Date Sun, 04 Aug 2013 14:18:07 GMT
Hi Stanley,

Let me provide a simplistic explanation, and others can help refine it
as necessary.

On Sat, Aug 03, 2013 at 09:34:48AM -0700, Stanley Iriele wrote:
> How then does distributed map/reduce work?

Each BigCouch node with a shard of the database also keeps that shard of
the view. When a request is made for a view, sufficient nodes are
queried to retrieve the view result, with the reduce operation occurring
as part of the return.

> if not all nodes have replications of all things how does that coordination
> happen during view building?

This is not true, all nodes do not have replications of all things. If
you ask a node for a view on a database it does not have at all, it will
use information in the partition map to ask that question of a node that
has at least one shard of the database in question, which will in turn
complete the scatter/gather request to other nodes participating in that
database.

> also its sharded right so certain nodes have a certain range of keys.

Right, see above.

> My problem is this. I need a solution that can incrementally scale across
> many hard disks...does big couch do this? with views and such?..if
> so..how?...and what are the drawbacks?

I wouldn't necessarily recommend running 1 BigCouch process per HD you
have on a single machine, but there's no reason that it wouldn't work.

The real challenge is that a database's partition map is determined
staticly at the time of database creation. If you choose to add more HDs
after this creation time, you will have to create a new database with
more shards, then replicate data to this new database to use the new
disks. The other option would be to use a very high number for 'q', then
rebalance the shard map onto the new disks and BigCouch processes. There
is a StackOverflow answer from Robert Newson that describes the process
for performing this operation.

In short, neither is trivial nor automated. For a single-machine system,
you'd do far better to use some sort of Logical Volume Manager to deal
with expanding storage over time, such as Linux's lvm, some HW raid
cards, ZFS or similar features in OSX and Windows.

> Thanks for any kind of response.
> 
> Regards,
> 
> Stanley

-- 
Joan Touzet | joant@atypical.net | wohali everywhere else

Mime
View raw message