couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Dionne (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-968) Duplicated IDs in _all_docs
Date Sat, 27 Nov 2010 20:13:38 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964448#action_12964448
] 

Bob Dionne commented on COUCHDB-968:
------------------------------------

I've just created a small test that resulted in over 1K dups in the database. Perhaps this
is abuse of couchdb but here's the test:

1. create 3 dbs and start continuous replications db1 -> db2 -> db3 -> db1   in a
ring.

2. add doc1 to db1

3. update doc1 N times where N is large

4. kill the client that's sending the updates (my test is erlang using ibrowse)

db1 which is where the updates are going now has 1085 dups of doc1

I originally tried this with lower values of revs limit and noticed that doc1 in db1 would
always end up with only one or two revisions, where db2 and db3 would have the full number,
.eg. 10

Obviously I shouldn't be doing this, but this is somewhat simpler that @tisba's case where
there are N choose 2 replications, loads of updates and daily compactions. 

In any event couchdb shouldn't let me do these things thru the APIs


> Duplicated IDs in _all_docs
> ---------------------------
>
>                 Key: COUCHDB-968
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-968
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>         Environment: Ubuntu 10.04.
>            Reporter: Sebastian Cohnen
>
> We have a database, which is causing serious trouble with compaction and replication
(huge memory and cpu usage, often causing couchdb to crash b/c all system memory is exhausted).
Yesterday we discovered that db/_all_docs is reporting duplicated IDs (see [1]). Until a few
minutes ago we thought that there are only few duplicates but today I took a closer look and
I found 10 IDs which sum up to a total of 922 duplicates. Some of them have only 1 duplicate,
others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> Unfortunately the database's contents are confidential and I'm not allowed to publish
it.
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message