incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: massive replication?
Date Mon, 26 Oct 2009 18:00:38 GMT

This sounds like what I was trying to propose. More concretely:

> I keep coming back to NNTP (USENET) as a model for many-to-many messaging:
> - push a message into a newsgroup on any NNTP node subscribing to that
> newsgroup

A message group is persisted in a DB. In the future, a single db with
filtered replication could work, but a db per group will probably be
easier. Not all nodes have all db's.

> - nodes exchange "I-have"/"You-have" on a regular basis

Replication of the "network status db" that holds what nodes have what
update_seq/host pairs. The pairing is important. The rate of
replication obviously affects delays and what not.

> - message propagate to all subscribing nodes by essentially a flooding or
> epidemic routing mechanism

This is the fun part. Given your local copy of a group, how do you
pick a peer to pull from. Or push to if you're feeling proactive. Do
we sync in both directions, etc etc. Depending on the behavior this
could manifest in multiple ways. If you have a single authority per
'subscription source' then you'd want to keep
"authority/update_seq/located_at" triples. This way you know that if
the largest update_seq seen is N, then you can replicate from any node
that has that sequence.

> - pretty quickly, a message propagates to all nodes subscribing to the
> newsgroup

Message delivery via replication. Judging the quickly part would
require a more formal definition of quickly and then building the
thing. Depending on your definition of quickly there are definitely
different types of design decisions to be made.

> - lack of connectivity simply delays message propagation

Or reroutes through an alternate node. If we know that four nodes have
a copy of the db, we can replicate from any that are alive.
Replication is incremental, a link can disappear in the middle of a
replication and it'll resume from the previous check point.

> - the whole system scales massively, and is very robust in the face of
> connectivity outages, node failures, etc. (messages can flow across multiple
> routes)

It scales on my brain debugger assuming the propagation algorithm
isn't extremely naïve. But as I said, I haven't built it yet so who

> In some sense, what I'm thinking of would look a lot like:
> - a group of CouchDB nodes all subscribe to a newsgroup

I'm confused by the term subscription here. Generally I'd assume that
means that a foreign host knows that the local node is interested in
something. For fully distributed awesome, I think it'd be better
phrased as "nodes can declare what they want" and the algorithm will
attempt to maintain their local state somewhere close to the global
state. Kind of like the difference between newspaper delivery and
buying a newspaper at any one of the many shops in town.

> - each node publishes changes as messages to that newsgroup

$ curl -X PUT -d '{"message": "First post!!1!1"}'

Part of the replication routing could include a proactive step in
pushing messages to nodes it  knows care about a message.

> - NNTP takes care of getting messages everywhere, eventually

The 'network state in a db' means that regardless of who's interested
in what, if we make sure that the first step in node communication is
a 'i know about these endpoints in these states' then we'll push
information to the people that care. Eventually.

> - each node looks for incoming messages and applies them as changes

Replication FTW!

> - use a shared key to secure things (note: some implementations of NNTP
> already support secure messaging)

This is slightly harder. Read only means you'd need something infront
of couch to prevent readers. But OAuth or distributing SSL certs or
similar wouldn't be out of the question.

> A similar approach could be taken using:
> - a distributed hash table as a message que (that's what spread and splines
> seem to do)

I haven't looked to hard at these, but "DHT as message queue" seems
contradictory to the idea that most DHT's (all?) that I know of don't
allow range queries.

> - the DIS or HLA protocols (used for distributed simulation - keeping
> multiple copies of a "world" synchronized)

I couldn't get past the first wiki page Google gave me, so, I dunno.

Paul Davis

View raw message