couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <adam.kocolo...@gmail.com>
Subject Re: distributed map/reduce in BigCouch
Date Sun, 04 Aug 2013 17:29:53 GMT
The 'q' value is the number of unique shards that comprise the single logical database.  BigCouch
will keep 'n' replicas of each shard, so in total a database will have q*n .couch files associated
with it.

The number of nodes in the cluster is an independent concern; BigCouch will happily store
multiple shards on a single machine or leave some machines in the cluster without any data
from a particular clustered database.  The number of nodes must only be greater than or equal
to the 'n' value for all databases (BigCouch will not store multiple copies of a shard on
the same node in the cluster).

Adam

On Aug 4, 2013, at 4:32 PM, Stanley Iriele <siriele@breaktimestudios.com> wrote:

> Thanks joan,
> 
> I worded part of that poorly...When I said hard drives I really meant
> physical machines.. But the bottleneck was disk space; so I said hard
> disks... But That completely answered my question actually.... I appreciate
> that...what is this 'q' value?..is that the number of nodes?.. Or is that
> the r + w > n thing... Either way that answers questions thank you!
> On Aug 4, 2013 7:18 AM, "Joan Touzet" <wohali@apache.org> wrote:
> 
>> Hi Stanley,
>> 
>> Let me provide a simplistic explanation, and others can help refine it
>> as necessary.
>> 
>> On Sat, Aug 03, 2013 at 09:34:48AM -0700, Stanley Iriele wrote:
>>> How then does distributed map/reduce work?
>> 
>> Each BigCouch node with a shard of the database also keeps that shard of
>> the view. When a request is made for a view, sufficient nodes are
>> queried to retrieve the view result, with the reduce operation occurring
>> as part of the return.
>> 
>>> if not all nodes have replications of all things how does that
>> coordination
>>> happen during view building?
>> 
>> This is not true, all nodes do not have replications of all things. If
>> you ask a node for a view on a database it does not have at all, it will
>> use information in the partition map to ask that question of a node that
>> has at least one shard of the database in question, which will in turn
>> complete the scatter/gather request to other nodes participating in that
>> database.
>> 
>>> also its sharded right so certain nodes have a certain range of keys.
>> 
>> Right, see above.
>> 
>>> My problem is this. I need a solution that can incrementally scale across
>>> many hard disks...does big couch do this? with views and such?..if
>>> so..how?...and what are the drawbacks?
>> 
>> I wouldn't necessarily recommend running 1 BigCouch process per HD you
>> have on a single machine, but there's no reason that it wouldn't work.
>> 
>> The real challenge is that a database's partition map is determined
>> staticly at the time of database creation. If you choose to add more HDs
>> after this creation time, you will have to create a new database with
>> more shards, then replicate data to this new database to use the new
>> disks. The other option would be to use a very high number for 'q', then
>> rebalance the shard map onto the new disks and BigCouch processes. There
>> is a StackOverflow answer from Robert Newson that describes the process
>> for performing this operation.
>> 
>> In short, neither is trivial nor automated. For a single-machine system,
>> you'd do far better to use some sort of Logical Volume Manager to deal
>> with expanding storage over time, such as Linux's lvm, some HW raid
>> cards, ZFS or similar features in OSX and Windows.
>> 
>>> Thanks for any kind of response.
>>> 
>>> Regards,
>>> 
>>> Stanley
>> 
>> --
>> Joan Touzet | joant@atypical.net | wohali everywhere else
>> 


Mime
View raw message