From user-return-13082-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Oct 07 09:27:49 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 20198 invoked from network); 7 Oct 2010 09:27:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Oct 2010 09:27:49 -0000 Received: (qmail 9307 invoked by uid 500); 7 Oct 2010 09:27:47 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 9192 invoked by uid 500); 7 Oct 2010 09:27:44 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 9184 invoked by uid 99); 7 Oct 2010 09:27:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Oct 2010 09:27:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elf2001@gmail.com designates 209.85.216.180 as permitted sender) Received: from [209.85.216.180] (HELO mail-qy0-f180.google.com) (209.85.216.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Oct 2010 09:27:37 +0000 Received: by qyk1 with SMTP id 1so511918qyk.11 for ; Thu, 07 Oct 2010 02:27:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=/9dQS/RXqQuLZRKyRL2PeOP2COA6kiL+9Nxh8LOIsbs=; b=UsdMH6LKE24spjlQ6DJC6wWtlpK2gfHbxhVF4YrvT6bTFX9+wZFBxHd32ScnJOaspR yMJ1YI6pKrTzn+OURqerNjAPGq2RaBFdcazAFPyyA3bXDRhezhxvy3LIKwb+wS2pWjI7 KXClXo+fSdtrjIQtSDKpz146gHtAfgbFw60kg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=qrV+KYKHGZVOIfb0sMPGOKn39fxyLsQPI0Noo3uLoiETPcy1z6gEd+OhnzB+6OQrXJ KZ8XqO9P7SirRQtO4Xa/VZvhQ02fptbkL7VeCv0i29ji3D3cl+sNoC9ize6/I0iiI2Kh s/SjOOo3xBq3fc1kaRCRSSQj6gv0bJndvPk+4= MIME-Version: 1.0 Received: by 10.229.213.200 with SMTP id gx8mr526395qcb.89.1286443636354; Thu, 07 Oct 2010 02:27:16 -0700 (PDT) Received: by 10.229.21.6 with HTTP; Thu, 7 Oct 2010 02:27:16 -0700 (PDT) In-Reply-To: References: Date: Thu, 7 Oct 2010 11:27:16 +0200 Message-ID: Subject: Re: view response with duplicate id's From: Alexey Loshkarev To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think, this is database file corruption. Query _all_docs returns me a lot of duplicates (about 3.000 duplicates in ~350.000-documents database). [12:17:48 root@node2 (~)]# curl http://localhost:5984/exhaust/_all_docs > all_docs % Total % Received % Xferd Average Speed Time Time Time Cur= rent Dload Upload Total Spent Left Spe= ed 100 37.7M 0 37.7M 0 0 1210k 0 --:--:-- 0:00:31 --:--:-- 9= 43k [12:18:23 root@node2 (~)]# wc -l all_docs 325102 all_docs [12:18:27 root@node2 (~)]# uniq all_docs |wc -l 322924 Node1 has duplicates too, but very small amount: [12:18:48 root@node1 (~)]# curl http://localhost:5984/exhaust/_all_docs > all_docs % Total % Received % Xferd Average Speed Time Time Time Cur= rent Dload Upload Total Spent Left Spe= ed 100 38.6M 0 38.6M 0 0 693k 0 --:--:-- 0:00:57 --:--:-- 55= 809 [12:19:57 root@node1 (~)]# wc -l all_docs 332714 all_docs [12:20:54 root@node1 (~)]# uniq all_docs |wc -l 332523 2010/10/7 Alexey Loshkarev : > I can't say what specific it may be, so let dive into history of this > database(s). > > First (before a 5-6 weeks) it was node2 server with couchdb v10.1. > There was testing database on it. There were alot of structural > changes, view updates and so on. > Than it becomes production and starts working ok. > Than we realize we need backup, and best - online backup (as we have > couchdb we can do this). > So, there appears node1 server with couchdb 1.0.1. I replicated node2 > to node1, than initiates continuous replication node1 -> node2 and > node2 -> node1. All clients works with node2 only. All works fine > about a month. > Few days before we was at peak load, so I'v want to use node1 and > node2 simultaneously. This was done by round-robin on DNS (host db > returns 2 different IP - node1's ip and node2's IP). All works fine > about 5 minutes, than I gave first conflict (view queues/all returns > two identical documents, one - actual version, second - conflicted > revision, document with field _conflict=3D"....."). Document ID was > q_tsentr. > As I don't has conflict resolver yet, I resolves conflict manually by > deleting conflicted revision. I'v also disables round-robin and move > all load to node2 to avoid conflicts for a while to wrote conflict > resolver. > > It works ok (node1 and node2 in mutual replications, active load on > node2) till yesterday. > Yesterday operator call me he has duplicate data in program. At this > queues/all returns 1 duplicated document - the same as few days before > (id =3D q_tsentr). One row consists of actual document version, another > row consists of old revision with field _conflicted_revision=3D"some old > revision". > > I tried to delete this revision but without success. GET for > q_tsentr?rev=3D"some old revision" returns valid document. DELETE > q_tsentr?rev=3D"some old revision" gaves me 409 error. > Here are log files (node2): > > [Wed, 06 Oct 2010 12:17:19 GMT] [info] [<0.7239.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:17:30 GMT] [info] [<0.7245.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:17:35 GMT] [info] [<0.7287.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:17:43 GMT] [info] [<0.7345.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:18:02 GMT] [info] [<0.7864.1462>] 10.0.0.41 - - > 'DELETE' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 > 409 > [Wed, 06 Oct 2010 12:18:29 GMT] [info] [<0.8331.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:18:39 GMT] [info] [<0.8363.1462>] 10.0.0.41 - - > 'DELETE' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 > 409 > [Wed, 06 Oct 2010 12:38:19 GMT] [info] [<0.16765.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:40:40 GMT] [info] [<0.17337.1462>] 10.0.0.41 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:40:45 GMT] [info] [<0.17344.1462>] 10.0.0.41 - - > 'DELETE' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 > 404 > > Logs at node1: > > [Wed, 06 Oct 2010 12:17:46 GMT] [info] [<0.25979.462>] 10.20.20.13 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:17:56 GMT] [info] [<0.26002.462>] 10.20.20.13 - - > 'DELETE' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 > 200 > [Wed, 06 Oct 2010 12:21:25 GMT] [info] [<0.27133.462>] 10.20.20.13 - - > 'DELETE' /exhaust/q_tsentr?rev=3Dall 404 > [Wed, 06 Oct 2010 12:21:49 GMT] [info] [<0.27179.462>] 10.20.20.13 - - > 'DELETE' /exhaust/q_tsentr?revs=3Dtrue 404 > [Wed, 06 Oct 2010 12:24:41 GMT] [info] [<0.28959.462>] 10.20.20.13 - - > 'DELETE' /exhaust/q_tsentr?revs=3Dtrue 404 > [Wed, 06 Oct 2010 12:38:07 GMT] [info] [<0.10362.463>] 10.20.20.13 - - > 'GET' /exhaust/q_tsentr?revs=3Dall 404 > [Wed, 06 Oct 2010 12:38:23 GMT] [info] [<0.10534.463>] 10.20.20.13 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:40:25 GMT] [info] [<0.12014.463>] 10.20.20.13 - - > 'GET' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 200 > [Wed, 06 Oct 2010 12:40:33 GMT] [info] [<0.12109.463>] 10.20.20.13 - - > 'DELETE' /exhaust/q_tsentr?rev=3D27144-f516ac68e697874eef9c7562f3e2e229 > 404 > > So, I deletes this document and creates new one (id - q_tsentr2). > It will works fine about hour. > > Node2 has undeletable duplicate, so I move all clients to node1. There > were now such problem, view response was correct. > > Than I tried to recover database at node2. I stops, deletes view index > files and start couchdb again. Than i ping all view to recreate index. > At the end ot this procedure, i saw duplicates of identical rows (see > first letter in this thread). Node1 has no such problems, so I stops > replication, leave load on node1 and go for crying into this maillist. > > > 2010/10/6 Paul Davis : >> It was noted on IRC that I should give a bit more explanation. >> >> With the information that you've provided there are two possible >> explanations. Either your client code is not doing what you expect or >> you've triggered a really crazy bug in the view indexer that caused it >> to reindex a database without invalidating a view and not removing >> keys for docs when it reindexed. >> >> Given that no one has reported anything remotely like this and I can't >> immediately see a code path that would violate so many behaviours in >> the view updater, I'm leaning towards this being an issue in the >> client code. >> >> If there was something specific that changed since the view worked, >> that might illuminate what could cause this sort of behaviour if it is >> indeed a bug in CouchDB. >> >> HTH, >> Paul Davis >> >> On Wed, Oct 6, 2010 at 12:24 PM, Alexey Loshkarev wr= ote: >>> I have such view function (map only, without reduce) >>> >>> function(doc) { >>> =A0if (doc.type =3D=3D "queue") { >>> =A0 =A0emit(doc.ordering, doc.drivers); >>> =A0} >>> } >>> >>> It works perfect till yesterday, but today it start return duplicates >>> Example: >>> $ curl http://node2:5984/exhaust/_design/queues/_view/all >>> >>> {"total_rows":46,"offset":0,"rows":[ >>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smert= in_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij"= ,"d_krikunenko_aleksandr"]}, >>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smert= in_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij"= ,"d_krikunenko_aleksandr"]}, >>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smert= in_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij"= ,"d_krikunenko_aleksandr"]}, >>> ...... >>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzie= vskij_eduard"]}, >>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzie= vskij_eduard"]}, >>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzie= vskij_eduard"]}, >>> ........ >>> {"id":"q_otstoj","key":11,"value":["d_gavrilenko_aleksandr","d_klishnev= _sergej"]} >>> ]} >>> >>> >>> I tried to restart server, recreate view (remove view index file), >>> compact view and database and none of this helps, it still returns >>> duplicates. >>> What happens? How to avoid it in the future? >>> >>> -- >>> ---------------- >>> Best regards >>> Alexey Loshkarev >>> mailto:elf2001@gmail.com >>> >> > > > > -- > ---------------- > Best regards > Alexey Loshkarev > mailto:elf2001@gmail.com > --=20 ---------------- Best regards Alexey Loshkarev mailto:elf2001@gmail.com