incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Loshkarev <elf2...@gmail.com>
Subject Re: view response with duplicate id's
Date Thu, 07 Oct 2010 09:48:13 GMT
P.S. dmesg doesn't show any hardware problems (bad blocks, segfaults
and so on).
P.P.S. I think, I was migrate 0.10.1 -> 1.0.1 without database
replication, so it may be my fault.

2010/10/7 Alexey Loshkarev <elf2001@gmail.com>:
> I think, this is database file corruption. Query _all_docs returns me
> a lot of duplicates (about 3.000 duplicates in ~350.000-documents
> database).
>
>
> [12:17:48 root@node2 (~)]# curl
> http://localhost:5984/exhaust/_all_docs > all_docs
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left
 Speed
> 100 37.7M    0 37.7M    0     0  1210k      0 --:--:--  0:00:31 --:--:--  943k
> [12:18:23 root@node2 (~)]# wc -l all_docs
> 325102 all_docs
> [12:18:27 root@node2 (~)]# uniq all_docs |wc -l
> 322924
>
>
> Node1 has duplicates too, but very small amount:
> [12:18:48 root@node1 (~)]# curl
> http://localhost:5984/exhaust/_all_docs > all_docs
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left
 Speed
> 100 38.6M    0 38.6M    0     0   693k      0 --:--:--  0:00:57 --:--:-- 55809
> [12:19:57 root@node1 (~)]# wc -l all_docs
> 332714 all_docs
> [12:20:54 root@node1 (~)]# uniq all_docs |wc -l
> 332523
>
>
>
> 2010/10/7 Alexey Loshkarev <elf2001@gmail.com>:
>> I can't say what specific it may be, so let dive into history of this
>> database(s).
>>
>> First (before a 5-6 weeks) it was node2 server with couchdb v10.1.
>> There was testing database on it. There were alot of structural
>> changes, view updates and so on.
>> Than it becomes production and starts working ok.
>> Than we realize we need backup, and best - online backup (as we have
>> couchdb we can do this).
>> So, there appears node1 server with couchdb 1.0.1. I replicated node2
>> to node1, than initiates continuous replication node1 -> node2 and
>> node2 -> node1. All clients works with node2 only. All works fine
>> about a month.
>> Few days before we was at peak load, so I'v want to use node1 and
>> node2 simultaneously. This was done by round-robin on DNS (host db
>> returns 2 different IP - node1's ip and node2's IP). All works fine
>> about 5 minutes, than I gave first conflict (view queues/all returns
>> two identical documents, one - actual version, second - conflicted
>> revision, document with field _conflict="....."). Document ID was
>> q_tsentr.
>> As I don't has conflict resolver yet, I resolves conflict manually by
>> deleting conflicted revision. I'v also disables round-robin and move
>> all load to node2 to avoid conflicts for a while to wrote conflict
>> resolver.
>>
>> It works ok (node1 and node2 in mutual replications, active load on
>> node2) till yesterday.
>> Yesterday operator call me he has duplicate data in program. At this
>> queues/all returns 1 duplicated document - the same as few days before
>> (id = q_tsentr). One row consists of actual document version, another
>> row consists of old revision with field _conflicted_revision="some old
>> revision".
>>
>> I tried to delete this revision but without success. GET for
>> q_tsentr?rev="some old revision" returns valid document. DELETE
>> q_tsentr?rev="some old revision" gaves me 409 error.
>> Here are log files (node2):
>>
>> [Wed, 06 Oct 2010 12:17:19 GMT] [info] [<0.7239.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:17:30 GMT] [info] [<0.7245.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:17:35 GMT] [info] [<0.7287.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:17:43 GMT] [info] [<0.7345.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:18:02 GMT] [info] [<0.7864.1462>] 10.0.0.41 - -
>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229
>> 409
>> [Wed, 06 Oct 2010 12:18:29 GMT] [info] [<0.8331.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:18:39 GMT] [info] [<0.8363.1462>] 10.0.0.41 - -
>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229
>> 409
>> [Wed, 06 Oct 2010 12:38:19 GMT] [info] [<0.16765.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:40:40 GMT] [info] [<0.17337.1462>] 10.0.0.41 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:40:45 GMT] [info] [<0.17344.1462>] 10.0.0.41 - -
>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229
>> 404
>>
>> Logs at node1:
>>
>> [Wed, 06 Oct 2010 12:17:46 GMT] [info] [<0.25979.462>] 10.20.20.13 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:17:56 GMT] [info] [<0.26002.462>] 10.20.20.13 - -
>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229
>> 200
>> [Wed, 06 Oct 2010 12:21:25 GMT] [info] [<0.27133.462>] 10.20.20.13 - -
>> 'DELETE' /exhaust/q_tsentr?rev=all 404
>> [Wed, 06 Oct 2010 12:21:49 GMT] [info] [<0.27179.462>] 10.20.20.13 - -
>> 'DELETE' /exhaust/q_tsentr?revs=true 404
>> [Wed, 06 Oct 2010 12:24:41 GMT] [info] [<0.28959.462>] 10.20.20.13 - -
>> 'DELETE' /exhaust/q_tsentr?revs=true 404
>> [Wed, 06 Oct 2010 12:38:07 GMT] [info] [<0.10362.463>] 10.20.20.13 - -
>> 'GET' /exhaust/q_tsentr?revs=all 404
>> [Wed, 06 Oct 2010 12:38:23 GMT] [info] [<0.10534.463>] 10.20.20.13 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:40:25 GMT] [info] [<0.12014.463>] 10.20.20.13 - -
>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200
>> [Wed, 06 Oct 2010 12:40:33 GMT] [info] [<0.12109.463>] 10.20.20.13 - -
>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229
>> 404
>>
>> So, I deletes this document and creates new one (id - q_tsentr2).
>> It will works fine about hour.
>>
>> Node2 has undeletable duplicate, so I move all clients to node1. There
>> were now such problem, view response was correct.
>>
>> Than I tried to recover database at node2. I stops, deletes view index
>> files and start couchdb again. Than i ping all view to recreate index.
>> At the end ot this procedure, i saw duplicates of identical rows (see
>> first letter in this thread). Node1 has no such problems, so I stops
>> replication, leave load on node1 and go for crying into this maillist.
>>
>>
>> 2010/10/6 Paul Davis <paul.joseph.davis@gmail.com>:
>>> It was noted on IRC that I should give a bit more explanation.
>>>
>>> With the information that you've provided there are two possible
>>> explanations. Either your client code is not doing what you expect or
>>> you've triggered a really crazy bug in the view indexer that caused it
>>> to reindex a database without invalidating a view and not removing
>>> keys for docs when it reindexed.
>>>
>>> Given that no one has reported anything remotely like this and I can't
>>> immediately see a code path that would violate so many behaviours in
>>> the view updater, I'm leaning towards this being an issue in the
>>> client code.
>>>
>>> If there was something specific that changed since the view worked,
>>> that might illuminate what could cause this sort of behaviour if it is
>>> indeed a bug in CouchDB.
>>>
>>> HTH,
>>> Paul Davis
>>>
>>> On Wed, Oct 6, 2010 at 12:24 PM, Alexey Loshkarev <elf2001@gmail.com> wrote:
>>>> I have such view function (map only, without reduce)
>>>>
>>>> function(doc) {
>>>>  if (doc.type == "queue") {
>>>>    emit(doc.ordering, doc.drivers);
>>>>  }
>>>> }
>>>>
>>>> It works perfect till yesterday, but today it start return duplicates
>>>> Example:
>>>> $ curl http://node2:5984/exhaust/_design/queues/_view/all
>>>>
>>>> {"total_rows":46,"offset":0,"rows":[
>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]},
>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]},
>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]},
>>>> ......
>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]},
>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]},
>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]},
>>>> ........
>>>> {"id":"q_otstoj","key":11,"value":["d_gavrilenko_aleksandr","d_klishnev_sergej"]}
>>>> ]}
>>>>
>>>>
>>>> I tried to restart server, recreate view (remove view index file),
>>>> compact view and database and none of this helps, it still returns
>>>> duplicates.
>>>> What happens? How to avoid it in the future?
>>>>
>>>> --
>>>> ----------------
>>>> Best regards
>>>> Alexey Loshkarev
>>>> mailto:elf2001@gmail.com
>>>>
>>>
>>
>>
>>
>> --
>> ----------------
>> Best regards
>> Alexey Loshkarev
>> mailto:elf2001@gmail.com
>>
>
>
>
> --
> ----------------
> Best regards
> Alexey Loshkarev
> mailto:elf2001@gmail.com
>



-- 
----------------
Best regards
Alexey Loshkarev
mailto:elf2001@gmail.com

Mime
View raw message