incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: massive replication?
Date Mon, 26 Oct 2009 18:29:09 GMT
Miles,

Oh. Gotchya. But the fun part is the protocol implementation ;)

Paul Davis

On Mon, Oct 26, 2009 at 2:21 PM, Miles Fidelman
<mfidelman@meetinghouse.net> wrote:
> Paul,
>
> I'm actually suggesting using NNTP infrastructure, or something like it, to
> propagate updates - rather than trying to reinvent the protocol and
> supporting infrastructure in CouchDB.
> Something like:
> change -> local CouchDB instance -<continuous replication (output)> -> local
> NNTP daemon (specific newsgroup) -> "the net"
>
> "the net" -> local NNTP daemon (specific newsgroup) -> <continuous
> replication (input) -> local CouchDB instance
>
> All messages eventually get to all Couch instances - but the order and delay
> can vary considerably.
>
> Miles
>
>
> Paul Davis wrote:
>>
>> Miles,
>>
>> This sounds like what I was trying to propose. More concretely:
>>
>>
>>>
>>> I keep coming back to NNTP (USENET) as a model for many-to-many
>>> messaging:
>>>
>>> - push a message into a newsgroup on any NNTP node subscribing to that
>>> newsgroup
>>>
>>
>> A message group is persisted in a DB. In the future, a single db with
>> filtered replication could work, but a db per group will probably be
>> easier. Not all nodes have all db's.
>>
>>
>>>
>>> - nodes exchange "I-have"/"You-have" on a regular basis
>>>
>>
>> Replication of the "network status db" that holds what nodes have what
>> update_seq/host pairs. The pairing is important. The rate of
>> replication obviously affects delays and what not.
>>
>>
>>>
>>> - message propagate to all subscribing nodes by essentially a flooding or
>>> epidemic routing mechanism
>>>
>>
>> This is the fun part. Given your local copy of a group, how do you
>> pick a peer to pull from. Or push to if you're feeling proactive. Do
>> we sync in both directions, etc etc. Depending on the behavior this
>> could manifest in multiple ways. If you have a single authority per
>> 'subscription source' then you'd want to keep
>> "authority/update_seq/located_at" triples. This way you know that if
>> the largest update_seq seen is N, then you can replicate from any node
>> that has that sequence.
>>
>>
>>>
>>> - pretty quickly, a message propagates to all nodes subscribing to the
>>> newsgroup
>>>
>>
>> Message delivery via replication. Judging the quickly part would
>> require a more formal definition of quickly and then building the
>> thing. Depending on your definition of quickly there are definitely
>> different types of design decisions to be made.
>>
>>
>>>
>>> - lack of connectivity simply delays message propagation
>>>
>>
>> Or reroutes through an alternate node. If we know that four nodes have
>> a copy of the db, we can replicate from any that are alive.
>> Replication is incremental, a link can disappear in the middle of a
>> replication and it'll resume from the previous check point.
>>
>>
>>>
>>> - the whole system scales massively, and is very robust in the face of
>>> connectivity outages, node failures, etc. (messages can flow across
>>> multiple
>>> routes)
>>>
>>
>> It scales on my brain debugger assuming the propagation algorithm
>> isn't extremely naīve. But as I said, I haven't built it yet so who
>> knows.
>>
>>
>>>
>>> In some sense, what I'm thinking of would look a lot like:
>>>
>>> - a group of CouchDB nodes all subscribe to a newsgroup
>>>
>>
>> I'm confused by the term subscription here. Generally I'd assume that
>> means that a foreign host knows that the local node is interested in
>> something. For fully distributed awesome, I think it'd be better
>> phrased as "nodes can declare what they want" and the algorithm will
>> attempt to maintain their local state somewhere close to the global
>> state. Kind of like the difference between newspaper delivery and
>> buying a newspaper at any one of the many shops in town.
>>
>>
>>>
>>> - each node publishes changes as messages to that newsgroup
>>>
>>
>> $ curl -X PUT -d '{"message": "First post!!1!1"}'
>> http://127.0.0.1:5984/newsgroup/memesrus
>>
>> Part of the replication routing could include a proactive step in
>> pushing messages to nodes it  knows care about a message.
>>
>>
>>>
>>> - NNTP takes care of getting messages everywhere, eventually
>>>
>>
>> The 'network state in a db' means that regardless of who's interested
>> in what, if we make sure that the first step in node communication is
>> a 'i know about these endpoints in these states' then we'll push
>> information to the people that care. Eventually.
>>
>>
>>>
>>> - each node looks for incoming messages and applies them as changes
>>>
>>
>> Replication FTW!
>>
>>
>>>
>>> - use a shared key to secure things (note: some implementations of NNTP
>>> already support secure messaging)
>>>
>>
>> This is slightly harder. Read only means you'd need something infront
>> of couch to prevent readers. But OAuth or distributing SSL certs or
>> similar wouldn't be out of the question.
>>
>>
>>>
>>> A similar approach could be taken using:
>>> - a distributed hash table as a message que (that's what spread and
>>> splines
>>> seem to do)
>>>
>>
>> I haven't looked to hard at these, but "DHT as message queue" seems
>> contradictory to the idea that most DHT's (all?) that I know of don't
>> allow range queries.
>>
>>
>>>
>>> - the DIS or HLA protocols (used for distributed simulation - keeping
>>> multiple copies of a "world" synchronized)
>>>
>>
>> I couldn't get past the first wiki page Google gave me, so, I dunno.
>>
>> HTH,
>> Paul Davis
>>
>
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
>
>
>

Mime
View raw message