On Feb 4, 2010, at 5:05 PM, Randall Leeds wrote:
> On Thu, Feb 4, 2010 at 08:17, Adam Kocoloski <kocolosk@apache.org> wrote:
>>
>> If we went ahead and implemented this I think the UUID becomes superfluous from the
replicator's perspective. You wouldn't want to restrict this Merkle tree check to UUID-matched
DBs, as it would be useful for reducing entropy in a sharded database cluster that stores
multiple copies of each document in different database shards. In fact, IIRC that was a Dynamo
feature in the original Amazon paper.
>
> I mostly follow and I think I agree.
> Can you clarify "as it would be useful for reducing entropy..."?
>
> Randall
Sure, that was too terse on my part. I'm referring to the case where you're promising to
write N copies of a document in your cluster, but for whatever reason you only succeed W<N
times. Hence "entropy" -- the N shards start diverging from one another after transient failures.
You want those missing writes to eventually propagate to the N-W shards that didn't get them.
CouchDB's _changes replication works for this purpose, but it's relatively resource-intensive
because it checks for the existence of every update on the target. I suspect that comparing
Merkle trees may be a more efficient way to figure out what to replicate in this special case
where the two DBs are always supposed to be identical. Cheers,
Adam
|