couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Cohnen <sebastiancoh...@googlemail.com>
Subject Re: svn commit: r1043461 - /couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
Date Wed, 08 Dec 2010 15:54:07 GMT
do I read this correctly and two normal compaction runs will take care of dupes in both, _all_docs
and _changes?

On 08.12.2010, at 16:48, kocolosk@apache.org wrote:

> Author: kocolosk
> Date: Wed Dec  8 15:48:52 2010
> New Revision: 1043461
> 
> URL: http://svn.apache.org/viewvc?rev=1043461&view=rev
> Log:
> Usort the infos during compaction to remove dupes, COUCHDB-968
> 
> This is not a bulletproof solution; it only removes dupes when the
> they appear in the same batch of 1000 updates.  However, for dupes
> that show up in _all_docs the probability of that happening is quite
> high.  If the dupes are only in _changes a user may need to compact
> twice, once to get the dupes ordered together and a second time to
> remove them.
> 
> A more complete solution would be to trigger the compaction in "retry"
> mode, but this is siginificantly slower.
> 
> Modified:
>    couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
> 
> Modified: couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
> URL: http://svn.apache.org/viewvc/couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl?rev=1043461&r1=1043460&r2=1043461&view=diff
> ==============================================================================
> --- couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl (original)
> +++ couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl Wed Dec  8 15:48:52 2010
> @@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
>         end, Tree).
> 
> 
> -copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
> +copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
> +    % COUCHDB-968, make sure we prune duplicates during compaction
> +    InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B end,
> +        InfoBySeq0),
>     Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
>     LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),
> 
> 
> 


Mime
View raw message