couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1043527 - /couchdb/branches/0.11.x/src/couchdb/couch_db_updater.erl
Date Wed, 08 Dec 2010 17:10:30 GMT
Author: kocolosk
Date: Wed Dec  8 17:10:30 2010
New Revision: 1043527

Usort the infos during compaction to remove dupes, COUCHDB-968

This is not a bulletproof solution; it only removes dupes when the
they appear in the same batch of 1000 updates.  However, for dupes
that show up in _all_docs the probability of that happening is quite
high.  If the dupes are only in _changes a user may need to compact
twice, once to get the dupes ordered together and a second time to
remove them.

A more complete solution would be to trigger the compaction in "retry"
mode, but this is siginificantly slower.


Modified: couchdb/branches/0.11.x/src/couchdb/couch_db_updater.erl
--- couchdb/branches/0.11.x/src/couchdb/couch_db_updater.erl (original)
+++ couchdb/branches/0.11.x/src/couchdb/couch_db_updater.erl Wed Dec  8 17:10:30 2010
@@ -760,7 +760,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
         end, Tree).
-copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
+copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
+    % COUCHDB-968, make sure we prune duplicates during compaction
+    InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B end,
+        InfoBySeq0),
     Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
     LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),

View raw message