incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Iriele <siri...@breaktimestudios.com>
Subject Re: distributed map/reduce in BigCouch
Date Mon, 05 Aug 2013 20:15:23 GMT
hey,

that...was tremendously helpful. I assumed thats was how it all worked but
in my head it seemed "To good to be true" so I had to really make sure. I
saw the news about cochdb merging the bigcouch code into the code base. So
I take it it is safe for me t start my project using couchdb 1.3.0 as I
have been doing and just stand up bigcouch when i need to.

I have to play with Bigcouch a little more. while I am here...what is the
difference between Bigcouch's clustering techniques and couchbase's XDCR? I
won't be using couchbase because it doesn't have the flexibility and
functionality that couchdb does. but I was generally curious what the
difference was.

again..many thanks...I can now go full steam ahead with my project
regards,

Stanley




On Sun, Aug 4, 2013 at 10:29 AM, Adam Kocoloski <adam.kocoloski@gmail.com>wrote:

> The 'q' value is the number of unique shards that comprise the single
> logical database.  BigCouch will keep 'n' replicas of each shard, so in
> total a database will have q*n .couch files associated with it.
>
> The number of nodes in the cluster is an independent concern; BigCouch
> will happily store multiple shards on a single machine or leave some
> machines in the cluster without any data from a particular clustered
> database.  The number of nodes must only be greater than or equal to the
> 'n' value for all databases (BigCouch will not store multiple copies of a
> shard on the same node in the cluster).
>
> Adam
>
> On Aug 4, 2013, at 4:32 PM, Stanley Iriele <siriele@breaktimestudios.com>
> wrote:
>
> > Thanks joan,
> >
> > I worded part of that poorly...When I said hard drives I really meant
> > physical machines.. But the bottleneck was disk space; so I said hard
> > disks... But That completely answered my question actually.... I
> appreciate
> > that...what is this 'q' value?..is that the number of nodes?.. Or is that
> > the r + w > n thing... Either way that answers questions thank you!
> > On Aug 4, 2013 7:18 AM, "Joan Touzet" <wohali@apache.org> wrote:
> >
> >> Hi Stanley,
> >>
> >> Let me provide a simplistic explanation, and others can help refine it
> >> as necessary.
> >>
> >> On Sat, Aug 03, 2013 at 09:34:48AM -0700, Stanley Iriele wrote:
> >>> How then does distributed map/reduce work?
> >>
> >> Each BigCouch node with a shard of the database also keeps that shard of
> >> the view. When a request is made for a view, sufficient nodes are
> >> queried to retrieve the view result, with the reduce operation occurring
> >> as part of the return.
> >>
> >>> if not all nodes have replications of all things how does that
> >> coordination
> >>> happen during view building?
> >>
> >> This is not true, all nodes do not have replications of all things. If
> >> you ask a node for a view on a database it does not have at all, it will
> >> use information in the partition map to ask that question of a node that
> >> has at least one shard of the database in question, which will in turn
> >> complete the scatter/gather request to other nodes participating in that
> >> database.
> >>
> >>> also its sharded right so certain nodes have a certain range of keys.
> >>
> >> Right, see above.
> >>
> >>> My problem is this. I need a solution that can incrementally scale
> across
> >>> many hard disks...does big couch do this? with views and such?..if
> >>> so..how?...and what are the drawbacks?
> >>
> >> I wouldn't necessarily recommend running 1 BigCouch process per HD you
> >> have on a single machine, but there's no reason that it wouldn't work.
> >>
> >> The real challenge is that a database's partition map is determined
> >> staticly at the time of database creation. If you choose to add more HDs
> >> after this creation time, you will have to create a new database with
> >> more shards, then replicate data to this new database to use the new
> >> disks. The other option would be to use a very high number for 'q', then
> >> rebalance the shard map onto the new disks and BigCouch processes. There
> >> is a StackOverflow answer from Robert Newson that describes the process
> >> for performing this operation.
> >>
> >> In short, neither is trivial nor automated. For a single-machine system,
> >> you'd do far better to use some sort of Logical Volume Manager to deal
> >> with expanding storage over time, such as Linux's lvm, some HW raid
> >> cards, ZFS or similar features in OSX and Windows.
> >>
> >>> Thanks for any kind of response.
> >>>
> >>> Regards,
> >>>
> >>> Stanley
> >>
> >> --
> >> Joan Touzet | joant@atypical.net | wohali everywhere else
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message