Return-Path: X-Original-To: apmail-couchdb-commits-archive@www.apache.org Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 513F6188DA for ; Sat, 18 Jul 2015 11:29:15 +0000 (UTC) Received: (qmail 26750 invoked by uid 500); 18 Jul 2015 11:29:15 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 26696 invoked by uid 500); 18 Jul 2015 11:29:15 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 26683 invoked by uid 99); 18 Jul 2015 11:29:15 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jul 2015 11:29:15 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id EBE06E10A9; Sat, 18 Jul 2015 11:29:14 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: kocolosk@apache.org To: commits@couchdb.apache.org Date: Sat, 18 Jul 2015 11:29:15 -0000 Message-Id: <58e7b0459f3d4085a786ec7a255ff40d@git.apache.org> In-Reply-To: <8ad6b05acf944e589eed2f937ba1e921@git.apache.org> References: <8ad6b05acf944e589eed2f937ba1e921@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [2/2] couchdb commit: updated refs/heads/2735-duplicate-docs to 47d7b05 Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe -- we can end up with uniqueness violations on the primary key in the database -- and so we add an additional sort here. COUCHDB-2735 Project: http://git-wip-us.apache.org/repos/asf/couchdb/repo Commit: http://git-wip-us.apache.org/repos/asf/couchdb/commit/47d7b05f Tree: http://git-wip-us.apache.org/repos/asf/couchdb/tree/47d7b05f Diff: http://git-wip-us.apache.org/repos/asf/couchdb/diff/47d7b05f Branch: refs/heads/2735-duplicate-docs Commit: 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1 Parents: 5b1b3e1 Author: Adam Kocoloski Authored: Fri Jul 17 19:20:36 2015 -0400 Committer: Adam Kocoloski Committed: Fri Jul 17 19:20:36 2015 -0400 ---------------------------------------------------------------------- src/couchdb/couch_db_updater.erl | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/couchdb/blob/47d7b05f/src/couchdb/couch_db_updater.erl ---------------------------------------------------------------------- diff --git a/src/couchdb/couch_db_updater.erl b/src/couchdb/couch_db_updater.erl index 947669c..c92097f 100644 --- a/src/couchdb/couch_db_updater.erl +++ b/src/couchdb/couch_db_updater.erl @@ -222,7 +222,7 @@ handle_cast(Msg, #db{name = Name} = Db) -> handle_info({update_docs, Client, GroupedDocs, NonRepDocs, MergeConflicts, FullCommit}, Db) -> - GroupedDocs2 = [[{Client, D} || D <- DocGroup] || DocGroup <- GroupedDocs], + GroupedDocs2 = sort_and_tag_groups(Client, GroupedDocs), if NonRepDocs == [] -> {GroupedDocs3, Clients, FullCommit2} = collect_updates(GroupedDocs2, [Client], MergeConflicts, FullCommit); @@ -291,8 +291,7 @@ collect_updates(GroupedDocsAcc, ClientsAcc, MergeConflicts, FullCommit) -> % updaters than deal with their possible conflicts, and local docs % writes are relatively rare. Can be optmized later if really needed. {update_docs, Client, GroupedDocs, [], MergeConflicts, FullCommit2} -> - GroupedDocs2 = [[{Client, Doc} || Doc <- DocGroup] - || DocGroup <- GroupedDocs], + GroupedDocs2 = sort_and_tag_groups(Client, GroupedDocs), GroupedDocsAcc2 = merge_updates(GroupedDocsAcc, GroupedDocs2, []), collect_updates(GroupedDocsAcc2, [Client | ClientsAcc], @@ -302,6 +301,15 @@ collect_updates(GroupedDocsAcc, ClientsAcc, MergeConflicts, FullCommit) -> end. +sort_and_tag_groups(Client, GroupedDocs) -> + % These groups should already be sorted but sometimes clients misbehave. + % The merge_updates function will fail and the database can end up with + % duplicate documents if the incoming groups are not sorted, so as a sanity + % check we sort them again here. See COUCHDB-2735. + Cmp = fun([{#doc{id=A}, _}|_], [{#doc{id=B}, _}|_]) -> A < B end, + SortedGroups = lists:sort(Cmp, GroupedDocs), + [[{Client, D} || D <- DocGroup] || DocGroup <- SortedGroups]. + btree_by_seq_split(#doc_info{id=Id, high_seq=KeySeq, revs=Revs}) -> {RevInfos, DeletedRevInfos} = lists:foldl( fun(#rev_info{deleted = false, seq = Seq} = Ri, {Acc, AccDel}) ->