incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: massive replication?
Date Mon, 26 Oct 2009 16:12:56 GMT
On Oct 26, 2009, at 11:35 AM, Chris Anderson wrote:

> On Mon, Oct 26, 2009 at 8:33 AM, Miles Fidelman
> <mfidelman@meetinghouse.net> wrote:
>> Adam Kocoloski wrote:
>>>
>>> On Oct 26, 2009, at 10:45 AM, Miles Fidelman wrote:
>>>
>>>> The environment we're looking at is more of a mesh where  
>>>> connectivity is
>>>> coming up and down - think mobile ad hoc networks.
>>>>
>>>> I like the idea of a replication bus, perhaps using something  
>>>> like spread
>>>> (http://www.spread.org/) or spines (www.spines.org) as a multi- 
>>>> cast fabric.
>>>>
>>>> I'm thinking something like continuous replication - but where the
>>>> updates are pushed to a multi-cast port rather than to a specific  
>>>> node, with
>>>> each node subscribing to update feeds.
>>>>
>>>> Anybody have any thoughts on how that would play with the current
>>>> replication and conflict resolution schemes?
>>>>
>>>> Miles Fidelman
>>>
>>> Hi Miles, this sounds like really cool stuff.  Caveat: I have no
>>> experience using Spread/Spines and very little experience with IP
>>> multicasting, which I guess is what those tools try to reproduce in
>>> internet-like environments.  So bear with me if I ask stupid  
>>> questions.
>>>
>>> 1) Would the CouchDB servers be responsible for error detection and
>>> correction?  I imagine that complicates matters considerably, but it
>>> wouldn't be impossible.
>>
>> Good question.  I hadn't quite thought that far ahead.  I think the  
>> basic
>> answer is no (assume reliable multicast), but... some kind of healing
>> mechanism would probably be required (see below).
>>>
>>> 2) When these CouchDB servers drop off for an extended period and  
>>> then
>>> rejoin, how do they subscribe to the update feed from the  
>>> replication bus at
>>> a particular sequence?  This is really the key element of the  
>>> setup.  When I
>>> think of multicasting I think of video feeds and such, where if  
>>> you drop off
>>> and rejoin you don't care about the old stuff you missed.  That's  
>>> not the
>>> case here.  Does the bus store all this old feed data?
>>
>> Think of something like RSS, but with distributed infrastructure.
>> A node would publish an update to a specific address (e.g., like  
>> publishing
>> an RSS feed).
>>
>> All nodes would subscribe to the feed, and receive new messages in  
>> sequence.
>>  When picking up updates, you ask for everything after a particular  
>> sequence
>> number.  The update service maintains the data.
>
> The best candidate for an update service like this is probably a  
> CouchDB.

Sounds that way to me, too, although that could be because CouchDB is  
the hammer I know really well.

I'm still trying to figure out how multicast fits into the picture.  I  
can see it really helping to reduce bandwidth and server load in a  
case where the nodes are all expected to be online 100% of the time,  
but if nodes are coming and going they're likely to be requesting  
feeds at different starting sequences much of the time.  What's the  
win in that case?

Best, Adam



Mime
View raw message