couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <>
Subject [jira] Commented: (COUCHDB-968) Duplicated IDs in _all_docs
Date Mon, 29 Nov 2010 06:05:39 GMT


Paul Joseph Davis commented on COUCHDB-968:

So I spent some time today tracking this down. Here are some notes. 

The multiple entries in _all_docs is a bit of a red herring. Yes its something we should investigate
preventing in the future, but its just an expression of the underlying cause. 

What happens is that some how multiple update_seq entries are getting inserted into the database's
update_seq btree for a single document id. When compaction run it just iterates over this
btree and writes the docs to disk. This means that it'll just write multiple docs to that
tree. If we write multiple rows in a single btree query_modify call, its possible that we
end up with multiple rows with identical keys (which is bad). 

The real issue is how we end up with multiple update_seq entries for a given doc id. This
is where the replication and rev_stemming come in. Once a document's revision length has been
exceeded, there's apparently a way for two update_seq's to get inserted. After some digging,
I've found out that what happens is that couch_db_updater:update_docs_int ends up trying to
remove an update_seq that doesn't exist. Once this happens we have two update seq's for a
single doc id. 

So, next question is how do we screw up figuring out which update_seq to delete. 

The code in question would appear to be trying to delete the previous update_seq which gets
taken from the full_doc_info record. At this point, my exact understanding of the events gets
a bit hazy, so bear with me. 

What I think is happening is that a document with a full revision history gets written out
due to an interactive edit (ie, one that would fail wtih a conflict). Then when the replicator
attempts to write (in a manner that merges key trees, ie, no conflicts are possible) what
happens is that it gets a bit confused. For instance: 

Given the interactive edit resulted in a revision history of B-C-D, then the replicator attempts
to write a doc with history A-B-C, it gets confused on whether its writing a new doc or not.
At this point I get a bit lost. Some how a second edit comes in and the update_seq on the
full_doc_info object that gets looked up is newer than it should be, where as the entry in
the update_seq btree is older, hence, duplicate rows, hence compaction gives multiple docs
in _all_docs. 

Etc etc. 

I'm flying tomorrow so I'll have more time to investigate the exact consequences of these
various bits if no one beats me to it. If someone wants to take a crack at this, the next
place to start digging is in the bottom of couch_db_updater:merge_rev_trees where it attempts
to compare the new and old revision trees to decide on if it should update the update_seq
in the full_doc_info record. Specifically, I think we need to reevaluate the NewRevTree ==
OldTree comparison in the last if-statement as it appears the absolute root cause of this
bug is that comparison evaluating false when it should be true. 

> Duplicated IDs in _all_docs
> ---------------------------
>                 Key: COUCHDB-968
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2
>         Environment: Ubuntu 10.04.
>            Reporter: Sebastian Cohnen
>            Priority: Blocker
> We have a database, which is causing serious trouble with compaction and replication
(huge memory and cpu usage, often causing couchdb to crash b/c all system memory is exhausted).
Yesterday we discovered that db/_all_docs is reporting duplicated IDs (see [1]). Until a few
minutes ago we thought that there are only few duplicates but today I took a closer look and
I found 10 IDs which sum up to a total of 922 duplicates. Some of them have only 1 duplicate,
others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> * delayed_commits=false on all nodes
> * used couchdb versions 1.0.0 and 1.0.x (*)
> Unfortunately the database's contents are confidential and I'm not allowed to publish
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...
> [*]
> There were two (old) servers (1.0.0) in production (already having the replication and
compaction issues). Then two servers (1.0.x) were added and replication was set up to bring
them in sync with the old production servers since the two new servers were meant to replace
the old ones (to update node.js application code among other things).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message