couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: svn commit: r1043461 - /couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
Date Wed, 08 Dec 2010 16:12:31 GMT
With this patch applied, in ~99% of cases, yes.  Best,

Adam

On Dec 8, 2010, at 10:54 AM, Sebastian Cohnen wrote:

> do I read this correctly and two normal compaction runs will take care of dupes in both,
_all_docs and _changes?
> 
> On 08.12.2010, at 16:48, kocolosk@apache.org wrote:
> 
>> Author: kocolosk
>> Date: Wed Dec  8 15:48:52 2010
>> New Revision: 1043461
>> 
>> URL: http://svn.apache.org/viewvc?rev=1043461&view=rev
>> Log:
>> Usort the infos during compaction to remove dupes, COUCHDB-968
>> 
>> This is not a bulletproof solution; it only removes dupes when the
>> they appear in the same batch of 1000 updates.  However, for dupes
>> that show up in _all_docs the probability of that happening is quite
>> high.  If the dupes are only in _changes a user may need to compact
>> twice, once to get the dupes ordered together and a second time to
>> remove them.
>> 
>> A more complete solution would be to trigger the compaction in "retry"
>> mode, but this is siginificantly slower.
>> 
>> Modified:
>>   couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
>> 
>> Modified: couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
>> URL: http://svn.apache.org/viewvc/couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl?rev=1043461&r1=1043460&r2=1043461&view=diff
>> ==============================================================================
>> --- couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl (original)
>> +++ couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl Wed Dec  8 15:48:52 2010
>> @@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
>>        end, Tree).
>> 
>> 
>> -copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
>> +copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
>> +    % COUCHDB-968, make sure we prune duplicates during compaction
>> +    InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =<
B end,
>> +        InfoBySeq0),
>>    Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
>>    LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),
>> 
>> 
>> 
> 


Mime
View raw message