incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: How fast do CouchDB propagate changes to other nodes?
Date Sun, 19 Dec 2010 00:31:47 GMT
On Sat, Dec 18, 2010 at 16:14, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 4:46 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 04:00, Robert Dionne
>> <dionne@dionne-associates.com> wrote:
>>>
>>> On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:
>>>>
>>>> keeping cluster information and database metadata up to date around
>>>> the cluster, but that information tends to be small and changes
>>>> infrequently.
>>>>
>>>> However, to me this sounds like a lot of work for something that might
>>>> be better solved using technologies like zeromq, particularly if
>>>> logging all messages is optional.
>>>>
>>>> Anyway, I'm happy to talk about all of this further since I think it's
>>>> kind of fascinating. I've been thinking a lot recently about how flood
>>>
>>> I'm curious, is flood replication what the name implies? Broadcasting?
>>>
>>
>> I'll throw this at dev@, too.
>>
>> Yes, broadcasting.
>>
>> I've been thinking about alternative checkpoint schemes that take the
>> source and destination host out of the equation and figure out some
>> other way to verify common history. I imagine it's going to have to
>> involve a hash tree.
>>
>> With the ability to resolve common history without having *directly*
>> exchanged checkpoints, hosts could receive incremental update batches
>> from different hosts if the replication graph changes over time.
>>
>> Anyway, it's just a little infant of a thought, but I think it's a
>> good one to have in our collective conscious.
>>
>> Randall
>>
>
> Random off the top of my head response:
>
> I don't see anything immediately following from what you describe.
> Even if you had a way of saying "I already have this revision" there's
> no real way to figure out where to start once you get rid of the
> src/dst/seq triplet (that I can think of).
>
> Though an interesting observation is that replication never really
> delete's anything in a history. As a quick optimization that could
> lead to where you're wanting to go, you may check out storing a bloom
> filter for the database that stores a hash of the docid/rev pair for
> all incoming edits. Then the replicator could use that to speedup
> replication when its already got edits from the source db.
>
> Assuming you update that filter in real time and can update in
> progress replications, you should be able to get interesting patterns
> of edits moving through a cluster.
>
> Or something to that effect.
>
> Paul
>

Maybe I wasn't clear. There may be a place for bloom filter here, but
I was thinking something along the lines of "Hey, we've both have
history up to this point that's common, even if we didn't receive
those edits from the same place." If you imagine we had a hash tree of
every edit you could maybe do some back and forth bisection and
compare what your histories look like to find a common ancestor.

Anyway, the problem is definitely hard, but I'm glad to talk about it whenever.

Mime
View raw message