Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 89934 invoked from network); 30 Nov 2010 06:23:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 06:23:33 -0000 Received: (qmail 55229 invoked by uid 500); 30 Nov 2010 06:23:33 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 55088 invoked by uid 500); 30 Nov 2010 06:23:33 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 55080 invoked by uid 99); 30 Nov 2010 06:23:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 06:23:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 06:23:31 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAU6NBeL005306 for ; Tue, 30 Nov 2010 06:23:11 GMT Message-ID: <23024374.18651291098191198.JavaMail.jira@thor> Date: Tue, 30 Nov 2010 01:23:11 -0500 (EST) From: "Paul Joseph Davis (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-968) Duplicated IDs in _all_docs In-Reply-To: <12364317.322241290769453883.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965114#action_12965114 ] Paul Joseph Davis commented on COUCHDB-968: ------------------------------------------- Sorry for the delay, ended up having a flight cancelled and got rerouted and ended up not making it home till just now. I'm not sure I quite follow what you mean by uncompacted here. I would expect post compaction when we see the issue in _all_docs that they all have the same update_seq. Pre compaction in _changes I would expect the same _revision (I think, just guessing) because it's just iterating the by_seqid_btree and then displaying the update_seq from the actual #full_doc_info (I think, just guessing). As Bob Dionne noted in #couchdb, its not entirely clear where the actual bug is. Right now its a combination of three things basically: couch_key_tree:stem kinda sorta fails when merging two revision lists that exceed the rev_limit setting. Once that fails, we hit another issue that results in two entries in the by_seqid_btree, and then finally, compaction copies multiple docs to the actual by_docid_btree. After musing on it during the copious amounts of queueing I managed to accomplish today, I think that we should treat them as three bugs right now. My proposed fixes are basically such: 1. Fix couch_key_tree:stem so that it takes into account when the input write has a suffix that is a prefix of an existing edit path. This would avoid the rewrite that fixes everything. 2. We need to figure out a way to fix the breakage of the update_seq. Its a bit nebulous on whether this is an actual bug as the soution to #1 would fix all known occurences of this. I think the proper fix would be revisit couch_db_updater:merge_rev_trees and figure out a better way of picking the new update_seq (which would basically need to detect if an edit leaf was changed and only if so, update the update_seq. 3. Our btree implementation should probably check harder for the possibility of adding duplicate keys. The basic bug is that its a possibility in a single call to query_modify. A simple solution that I've implemented (that would impact all calls to query_modify) would be to check the input list of actions for duplicates. Ie, just iterate over the Actions list and find duplicate {Action, Key, _Value} tuples. (Ie, ignore differing values). Alternatively, a check deep down in modify_kvnode could discard Action/Key pairs that are greater than the last entry in ResultNode there by selecting one of the actions semi randomly (or alternatively, throw an error when not). I think technically, both are O(N) with N the size of the list of Actions that were requested. That is all. I'll look more tomorrow. Right now its time for beer and a bit of zoning out in front of the tele before I pass out. > Duplicated IDs in _all_docs > --------------------------- > > Key: COUCHDB-968 > URL: https://issues.apache.org/jira/browse/COUCHDB-968 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2 > Environment: Ubuntu 10.04. > Reporter: Sebastian Cohnen > Priority: Blocker > > We have a database, which is causing serious trouble with compaction and replication (huge memory and cpu usage, often causing couchdb to crash b/c all system memory is exhausted). Yesterday we discovered that db/_all_docs is reporting duplicated IDs (see [1]). Until a few minutes ago we thought that there are only few duplicates but today I took a closer look and I found 10 IDs which sum up to a total of 922 duplicates. Some of them have only 1 duplicate, others have hundreds. > Some facts about the database in question: > * ~13k documents, with 3-5k revs each > * all duplicated documents are in conflict (with 1 up to 14 conflicts) > * compaction is run on a daily bases > * several thousands updates per hour > * multi-master setup with pull replication from each other > * delayed_commits=false on all nodes > * used couchdb versions 1.0.0 and 1.0.x (*) > Unfortunately the database's contents are confidential and I'm not allowed to publish it. > [1]: Part of http://localhost:5984/DBNAME/_all_docs > ... > {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}}, > {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}}, > {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}}, > ... > [*] > There were two (old) servers (1.0.0) in production (already having the replication and compaction issues). Then two servers (1.0.x) were added and replication was set up to bring them in sync with the old production servers since the two new servers were meant to replace the old ones (to update node.js application code among other things). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.