couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <adam.kocolo...@gmail.com>
Subject Re: proposed replication rev history changes
Date Mon, 09 Feb 2009 17:42:11 GMT
On Feb 9, 2009, at 1:34 AM, Antony Blakey wrote:

> On 09/02/2009, at 4:01 PM, Adam Kocoloski wrote:
>
>> Ok, thanks for the clarification.  I don't see any major downsides  
>> beyond the ones you already mentioned. The inability to replicate  
>> between versions is a bit of a bummer -- I'd want to at least look  
>> into a bridge that lets old servers replicate to new ones.
>>
>> Your point about reducing the chance of collision is a good one,  
>> especially since Couch is using a 32 bit sample space for revision  
>> IDs.  The probability of zero collisions between any two revisions  
>> in a given document history is
>>
>> N!/((N-M)! * N^M)
>>
>> with N = 2**32 and M = "max rev history".  With M = 128, that  
>> probability drops to 0.999998.  In a 400k document DB where each  
>> doc has the max number of revisions it's likely that at least one  
>> has a duplicate rev.  That's no good.  I think we could eventually  
>> see transient cases of revisions being skipped by the replicator  
>> with the trunk code.
>
> If the revision were an SHA hash (admittedly), wouldn't the  
> increased value space, AND the fact that identical rev == identical  
> document, greatly relieve this problem?

Yes, we do plan to use a hash of the document content for the revision  
at some point.  You're right, we'd need to also increase the value  
space at the same time to actually relieve the collision problem.  160  
bits (or more) may be overkill, though.  We'll have to find some  
middle ground balancing collision probability and resource usage.  Best,

Adam

Mime
View raw message