Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Message-ID: <30526984.40751291173011391.JavaMail.jira@thor>
Date: Tue, 30 Nov 2010 22:10:11 -0500 (EST)
From: "Adam Kocoloski (JIRA)" <jira@apache.org>
To: dev@couchdb.apache.org
Subject: [jira] Commented: (COUCHDB-968) Duplicated IDs in _all_docs
In-Reply-To: <12364317.322241290769453883.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965555#action_12965555 ] 

Adam Kocoloski commented on COUCHDB-968:
----------------------------------------

I've put together a branch [1] that seems to resolve the problem with spurious changes to the revision tree.  There are two separate concerns:

a) The revision trees are compared before the new one is stemmed, so if the merging results in a tree with a branch larger than revs_limit it will automatically fail to match the old one.  This issue is addressed by [2], which simply passes the revs_limit to merge_rev_trees() and stems the new tree before comparing with the old one.

b) The merging logic occasionally selects values from the inserted tree rather than the old tree.  I'm pretty sure that's never a good idea.  The old tree contains pointers to the document bodies for all available revisions, while the new tree is typically a fairly empty structure, with ?REV_MISSING for all branch revisions and an unflushed #doc{} record representing the incoming edit.  Choosing the value from the InsertTree instead of the one from the old tree for a given key results in a mismatch between the old tree and the merged tree.  I'm pretty sure it also causes old branch revisions to become unavailable without compaction even running (*ahem* not that anyone should ever rely on them being available).  In [3] I rewrote the merging logic to always choose the value from the old tree for each key shared by the old tree and the inserted tree.

One possible issue with the patch in [3] is that it removes functionality.  The key tree previously supported merging a tree with multiple branches into an existing revision tree.  We have etap tests exercising that feature, but it's never used in CouchDB proper.  It still works to some extent, but that's mostly by accident.  I was definitely writing for the case of a linear revision history being merged into a tree (again, that's the only way this code is currently being used in CouchDB).

If we need to support full commutative tree merging there's probably a reasonable way to add it.  Otherwise I think we need to strip out some of the 060-kt-merging tests which now overstate the capabilities of couch_key_tree.  The new merge code is really merging a path into a tree, not merging two trees.

[1]: https://github.com/kocolosk/couchdb/tree/968-duplicate-seq-entries
[2]: https://github.com/kocolosk/couchdb/commit/eaed064f6113b10a59f05da2497be41c748b175a
[3]: https://github.com/kocolosk/couchdb/commit/09ff2f1b419ab9949e6a690ecda7faffc6c55210

> Duplicated IDs in _all_docs
> ---------------------------
>
>                 Key: COUCHDB-968
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-968
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2
>         Environment: Ubuntu 10.04.
>            Reporter: Sebastian Cohnen
>            Priority: Blocker
>
> We have a database, which is causing serious trouble with compaction and replication (huge memory and cpu usage, often causing couchdb to crash b/c all system memory is exhausted). Yesterday we discovered that db/_all_docs is reporting duplicated IDs (see [1]). Until a few minutes ago we thought that there are only few duplicates but today I took a closer look and I found 10 IDs which sum up to a total of 922 duplicates. Some of them have only 1 duplicate, others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> * delayed_commits=false on all nodes
> * used couchdb versions 1.0.0 and 1.0.x (*)
> Unfortunately the database's contents are confidential and I'm not allowed to publish it.
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...
> [*]
> There were two (old) servers (1.0.0) in production (already having the replication and compaction issues). Then two servers (1.0.x) were added and replication was set up to bring them in sync with the old production servers since the two new servers were meant to replace the old ones (to update node.js application code among other things).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.