couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Hallett <halle...@gmail.com>
Subject Re: massive replication?
Date Mon, 26 Oct 2009 16:43:46 GMT
After reading some Wikipedia it looks to me that Chris is right about an
information-spreading gossip protocol being the way to go.  You could have
each node replicate with a small number of other nodes at random - or with
its "neighbor" nodes - some fixed number of times for minute.  That should
get updates distributed effectively.

Then the problem comes down to maintaining a list of active peers.  One way
to do the would be to set bittorrent-style trackers: a handful of nodes that
are unlikely to go down very often.  Every Couch instance would register
itself with a tracker.

Another approach would be, as someone suggested, keeping a special database
in each node with a list of peers.  On every replication nodes would also
replicate this database so that everybody has the same list.  Peers could be
marked as inactive if they time out consistently.  The tricky part here is
what address to give new nodes as a starting point for their first
replication.

The problem that I see with a bus of updates is that update sequences are
local and will likely not match up from node to node.  You could use
timestamps instead of update sequences though if you expect all of the nodes
to have relatively synchronized clocks.  But I suggest avoiding this path.
Replications in CouchDB are cheap and it is ok to have lots of redundant
replication attempts.

Some sort of supervisory agent would be required for any solution.  But
CouchDB's replication ability should make the design of that agent far
easier than it would be with most other systems.

On Oct 26, 2009 9:13 AM, "Adam Kocoloski" <kocolosk@apache.org> wrote:

On Oct 26, 2009, at 11:35 AM, Chris Anderson wrote: > On Mon, Oct 26, 2009
at 8:33 AM, Miles Fidelm...
Sounds that way to me, too, although that could be because CouchDB is the
hammer I know really well.

I'm still trying to figure out how multicast fits into the picture.  I can
see it really helping to reduce bandwidth and server load in a case where
the nodes are all expected to be online 100% of the time, but if nodes are
coming and going they're likely to be requesting feeds at different starting
sequences much of the time.  What's the win in that case?

Best, Adam

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message