Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <20130804141807.GA1197@atypical.net>
References: 
 <CAPNuaChsXCb7cLgutaJJSSVQmrwCV+ovjdaY0UBOats3JBa4YQ@mail.gmail.com>
	<20130804141807.GA1197@atypical.net>
Date: Sun, 4 Aug 2013 08:32:12 -0700
Message-ID: 
 <CA+UPnOXLX9CZb1U=nKCCOOtjBdbqy+SUD3T658OiyvTpQkNw7w@mail.gmail.com>
Subject: Re: distributed map/reduce in BigCouch
From: Stanley Iriele <siriele@breaktimestudios.com>
To: user@couchdb.apache.org
Content-Type: multipart/alternative; boundary=001a1133232e58b1ae04e320e56f

--001a1133232e58b1ae04e320e56f
Content-Type: text/plain; charset=ISO-8859-1

Thanks joan,

I worded part of that poorly...When I said hard drives I really meant
physical machines.. But the bottleneck was disk space; so I said hard
disks... But That completely answered my question actually.... I appreciate
that...what is this 'q' value?..is that the number of nodes?.. Or is that
the r + w > n thing... Either way that answers questions thank you!
On Aug 4, 2013 7:18 AM, "Joan Touzet" <wohali@apache.org> wrote:

> Hi Stanley,
>
> Let me provide a simplistic explanation, and others can help refine it
> as necessary.
>
> On Sat, Aug 03, 2013 at 09:34:48AM -0700, Stanley Iriele wrote:
> > How then does distributed map/reduce work?
>
> Each BigCouch node with a shard of the database also keeps that shard of
> the view. When a request is made for a view, sufficient nodes are
> queried to retrieve the view result, with the reduce operation occurring
> as part of the return.
>
> > if not all nodes have replications of all things how does that
> coordination
> > happen during view building?
>
> This is not true, all nodes do not have replications of all things. If
> you ask a node for a view on a database it does not have at all, it will
> use information in the partition map to ask that question of a node that
> has at least one shard of the database in question, which will in turn
> complete the scatter/gather request to other nodes participating in that
> database.
>
> > also its sharded right so certain nodes have a certain range of keys.
>
> Right, see above.
>
> > My problem is this. I need a solution that can incrementally scale across
> > many hard disks...does big couch do this? with views and such?..if
> > so..how?...and what are the drawbacks?
>
> I wouldn't necessarily recommend running 1 BigCouch process per HD you
> have on a single machine, but there's no reason that it wouldn't work.
>
> The real challenge is that a database's partition map is determined
> staticly at the time of database creation. If you choose to add more HDs
> after this creation time, you will have to create a new database with
> more shards, then replicate data to this new database to use the new
> disks. The other option would be to use a very high number for 'q', then
> rebalance the shard map onto the new disks and BigCouch processes. There
> is a StackOverflow answer from Robert Newson that describes the process
> for performing this operation.
>
> In short, neither is trivial nor automated. For a single-machine system,
> you'd do far better to use some sort of Logical Volume Manager to deal
> with expanding storage over time, such as Linux's lvm, some HW raid
> cards, ZFS or similar features in OSX and Windows.
>
> > Thanks for any kind of response.
> >
> > Regards,
> >
> > Stanley
>
> --
> Joan Touzet | joant@atypical.net | wohali everywhere else
>

--001a1133232e58b1ae04e320e56f--