From dev-return-17936-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Fri Sep 2 10:30:01 2011 Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D8037D90 for ; Fri, 2 Sep 2011 10:30:01 +0000 (UTC) Received: (qmail 23125 invoked by uid 500); 2 Sep 2011 10:30:00 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 22458 invoked by uid 500); 2 Sep 2011 10:29:52 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 22446 invoked by uid 99); 2 Sep 2011 10:29:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 10:29:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dave@muse.net.nz designates 209.85.212.52 as permitted sender) Received: from [209.85.212.52] (HELO mail-vw0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 10:29:39 +0000 Received: by vws16 with SMTP id 16so3145166vws.11 for ; Fri, 02 Sep 2011 03:29:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.180.39 with SMTP id dl7mr790428vdc.265.1314959358762; Fri, 02 Sep 2011 03:29:18 -0700 (PDT) Received: by 10.52.187.233 with HTTP; Fri, 2 Sep 2011 03:29:18 -0700 (PDT) X-Originating-IP: [125.236.236.206] In-Reply-To: <20110902043111.2A69823889E7@eris.apache.org> References: <20110902043111.2A69823889E7@eris.apache.org> Date: Fri, 2 Sep 2011 22:29:18 +1200 Message-ID: Subject: Re: svn commit: r1164350 - in /couchdb/trunk: share/www/script/test/recreate_doc.js src/couchdb/couch_doc.erl src/couchdb/couch_key_tree.erl From: Dave Cottlehuber To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Paul, Love your work mate! Outstanding commit msg, I'm working through it (slowly= ). A+ D On 2 September 2011 16:31, wrote: > Author: davisp > Date: Fri Sep =A02 04:31:10 2011 > New Revision: 1164350 > > URL: http://svn.apache.org/viewvc?rev=3D1164350&view=3Drev > Log: > Fix introduction of duplicates into _changes feed > > When a document is updated the new update_seq is assigned as part of the > rev_tree merging in couch_db_updater:merge_rev_trees/7 based on the > condition of whether the new rev_tree is equal to the old tree. This > equality is done as a simple Erlang term comparison. If the trees are > not equal a new update_seq is assigned to the #full_doc_info{} record > that is stored in fulldocinfo_by_id_btree. > > During replication it is possible that a document update merges into the > rev_tree internally without creating a leaf. If the terminal node of the > replicated path happens to land on a node with a value of ?REV_MISSING > the new document information will be preferred and replace the > ?REV_MISSING value. > > This preference ends up causing the rev_tree comparison to evaluate to > false which ends up giving this document a new update_seq. Up until this > point everything is fine. We write a bit of extra data (that will be > cleared during compaction) because of a previous bug where we decided to > be cautious and avoid losing data due to a broken rev_tree merging > aglorithm. It is also important to note that this is the place were we > calculate the update_seq to remove from the docinfo_by_seq_tree. > > After this point we get back to couch_db_udpater:update_docs_int/5 where > we eventually call couch_db_updater:new_index_entries/3 which creates > the new entries for the fulldocinfo_by_id_tree and docinfo_by_seq_btree. > At this point we end up creating a #doc_info{} record based on the > leaves in the rev_tree. As you recall, the update that caused the new > update_seq was not a leaf, at this point we create a #doc_info{} record > with an incorrect high_seq member pointing to the update_seq we are > about to remove from the docinfo_by_seq_tree (remember we calculated the > seq to remove before we consulted the leaves). > > The end result is that we remove the same update_seq we insert. This > sets us up for the real bug. The next time we go to update this document > the same procedure is applied. But what happens is at the point we > calculate the seq to remove from docinfo_by_seq_tree, we calculate the > wrong value. Thus when the update continues we remove an update_seq that > doesn't exist in the tree and insert our new seq. But, the seq from the > previous update is still in the tree. Thus, our docinfo_by_seq_tree now > contains two entries pointing at the same document. > > At this point, we run into the observed behavior of this bug that ends > up causing duplicate entries in views which then ends up throwing errors > when the view is compaction. These errors are also particularly nasty > because they bubble up the the couch_view gen_server which crashes and > spiders out crashing every couch_view_group process. That part probably > isn't important though. > > There's a simple test include with the patch to illustrate the behavior > and maintain an assertion that it stays fixed. > > Fixes COUCHDB-1265 > > > Modified: > =A0 =A0couchdb/trunk/share/www/script/test/recreate_doc.js > =A0 =A0couchdb/trunk/src/couchdb/couch_doc.erl > =A0 =A0couchdb/trunk/src/couchdb/couch_key_tree.erl > > Modified: couchdb/trunk/share/www/script/test/recreate_doc.js > URL: http://svn.apache.org/viewvc/couchdb/trunk/share/www/script/test/rec= reate_doc.js?rev=3D1164350&r1=3D1164349&r2=3D1164350&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- couchdb/trunk/share/www/script/test/recreate_doc.js (original) > +++ couchdb/trunk/share/www/script/test/recreate_doc.js Fri Sep =A02 04:3= 1:10 2011 > @@ -77,4 +77,45 @@ couchTests.recreate_doc =3D function(debug > =A0 } catch (e) { > =A0 =A0 T(e.error =3D=3D "conflict"); > =A0 } > + > + =A0db.deleteDb(); > + =A0db.createDb(); > + > + =A0// COUCHDB-1265 > + =A0// Resuscitate an unavailable old revision and make sure that it > + =A0// doesn't introduce duplicates into the _changes feed. > + > + =A0var doc =3D {_id: "bar", count: 0}; > + =A0T(db.save(doc).ok); > + =A0var ghost =3D {_id: "bar", _rev: doc._rev, count: doc.count}; > + =A0for(var i =3D 0; i < 2; i++) { > + =A0 =A0doc.count +=3D 1; > + =A0 =A0T(db.save(doc).ok); > + =A0} > + > + =A0// Compact so that the old revision to be resuscitated will be > + =A0// in the rev_tree as ?REV_MISSING > + =A0db.compact(); > + =A0while(db.info().compact_running) {} > + > + =A0// Saving the ghost here puts it back in the rev_tree in such > + =A0// a way as to create a new update_seq but without changing a > + =A0// leaf revision. This would cause the #full_doc_info{} and > + =A0// #doc_info{} records to diverge in their idea of what the > + =A0// doc's update_seq is and end up introducing a duplicate in > + =A0// the _changes feed the next time this doc is updated. > + =A0T(db.save(ghost, {new_edits: false}).ok); > + > + =A0// The duplicate would have been introduce here becuase the #doc_inf= o{} > + =A0// would not have been removed correctly. > + =A0T(db.save(doc).ok); > + > + =A0// And finally assert that there are no duplicates in _changes. > + =A0var req =3D CouchDB.request("GET", "/test_suite_db/_changes"); > + =A0var resp =3D JSON.parse(req.responseText); > + =A0var docids =3D {}; > + =A0for(var i =3D 0; i < resp.results.length; i++) { > + =A0 =A0T(docids[resp.results[i].id] =3D=3D=3D undefined, "Duplicates in= _changes feed."); > + =A0 =A0docids[resp.results[i].id] =3D true; > + =A0} > =A0}; > > Modified: couchdb/trunk/src/couchdb/couch_doc.erl > URL: http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_doc.erl= ?rev=3D1164350&r1=3D1164349&r2=3D1164350&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- couchdb/trunk/src/couchdb/couch_doc.erl (original) > +++ couchdb/trunk/src/couchdb/couch_doc.erl Fri Sep =A02 04:31:10 2011 > @@ -319,10 +319,19 @@ to_doc_info(FullDocInfo) -> > =A0 =A0 {DocInfo, _Path} =3D to_doc_info_path(FullDocInfo), > =A0 =A0 DocInfo. > > -max_seq([], Max) -> > - =A0 =A0Max; > -max_seq([#rev_info{seq=3DSeq}|Rest], Max) -> > - =A0 =A0max_seq(Rest, if Max > Seq -> Max; true -> Seq end). > +max_seq(Tree) -> > + =A0 =A0FoldFun =3D fun({_Pos, _Key}, Value, _Type, MaxOldSeq) -> > + =A0 =A0 =A0 =A0case Value of > + =A0 =A0 =A0 =A0 =A0 =A0{_Deleted, _DiskPos, OldTreeSeq} -> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0% Older versions didn't track data sizes= . > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0erlang:max(MaxOldSeq, OldTreeSeq); > + =A0 =A0 =A0 =A0 =A0 =A0{_Deleted, _DiskPos, OldTreeSeq, _Size} -> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0erlang:max(MaxOldSeq, OldTreeSeq); > + =A0 =A0 =A0 =A0 =A0 =A0_ -> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MaxOldSeq > + =A0 =A0 =A0 =A0end > + =A0 =A0end, > + =A0 =A0couch_key_tree:fold(FoldFun, 0, Tree). > > =A0to_doc_info_path(#full_doc_info{id=3DId,rev_tree=3DTree}) -> > =A0 =A0 RevInfosAndPath =3D [ > @@ -342,7 +351,7 @@ to_doc_info_path(#full_doc_info{id=3DId,re > =A0 =A0 =A0 =A0 end, RevInfosAndPath), > =A0 =A0 [{_RevInfo, WinPath}|_] =3D SortedRevInfosAndPath, > =A0 =A0 RevInfos =3D [RevInfo || {RevInfo, _Path} <- SortedRevInfosAndPat= h], > - =A0 =A0{#doc_info{id=3DId, high_seq=3Dmax_seq(RevInfos, 0), revs=3DRevI= nfos}, WinPath}. > + =A0 =A0{#doc_info{id=3DId, high_seq=3Dmax_seq(Tree), revs=3DRevInfos}, = WinPath}. > > > > > Modified: couchdb/trunk/src/couchdb/couch_key_tree.erl > URL: http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_key_tre= e.erl?rev=3D1164350&r1=3D1164349&r2=3D1164350&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- couchdb/trunk/src/couchdb/couch_key_tree.erl (original) > +++ couchdb/trunk/src/couchdb/couch_key_tree.erl Fri Sep =A02 04:31:10 20= 11 > @@ -49,7 +49,7 @@ > > =A0-export([merge/3, find_missing/2, get_key_leafs/2, get_full_key_paths/= 2, get/2]). > =A0-export([get_all_leafs/1, count_leafs/1, remove_leafs/2, get_all_leafs= _full/1, stem/2]). > --export([map/2, mapfold/3, map_leafs/2]). > +-export([map/2, mapfold/3, map_leafs/2, fold/3]). > > =A0-include("couch_db.hrl"). > > @@ -320,6 +320,21 @@ count_leafs_simple([{_Key, _Value, SubTr > =A0 =A0 count_leafs_simple(SubTree) + count_leafs_simple(RestTree). > > > +fold(_Fun, Acc, []) -> > + =A0 =A0Acc; > +fold(Fun, Acc0, [{Pos, Tree}|Rest]) -> > + =A0 =A0Acc1 =3D fold_simple(Fun, Acc0, Pos, [Tree]), > + =A0 =A0fold(Fun, Acc1, Rest). > + > +fold_simple(_Fun, Acc, _Pos, []) -> > + =A0 =A0Acc; > +fold_simple(Fun, Acc0, Pos, [{Key, Value, SubTree} | RestTree]) -> > + =A0 =A0Type =3D if SubTree =3D=3D [] -> leaf; true -> branch end, > + =A0 =A0Acc1 =3D Fun({Pos, Key}, Value, Type, Acc0), > + =A0 =A0Acc2 =3D fold_simple(Fun, Acc1, Pos+1, SubTree), > + =A0 =A0fold_simple(Fun, Acc2, Pos, RestTree). > + > + > =A0map(_Fun, []) -> > =A0 =A0 []; > =A0map(Fun, [{Pos, Tree}|Rest]) -> > > >