incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: partial replications
Date Mon, 28 Sep 2009 22:38:41 GMT
On Sep 28, 2009, at 4:44 PM, Ning Tan wrote:

> On Mon, Sep 28, 2009 at 2:41 PM, Adam Kocoloski  
> <kocolosk@apache.org> wrote:
>> On Sep 28, 2009, at 1:21 PM, Ning Tan wrote:
>>
>>> Hi,
>>>
>>> When we replicate between a remote database and a local one (pulling
>>> from remote into local), we are observing partial replications,
>>> meaning that we have to issue repeated _replicate calls for the
>>> replication to complete. For a database with 10,000 documents, for
>>> example, it could take up to 7 calls for the entire database to
>>> replicate into an empty one. Each time, the number of documents
>>> replicated over seemed random.
>>>
>>> Thanks.
>>
>> Hi, it's certainly not the expected behavior.  When the POST to  
>> _replicate
>> returns and not all documents have been replicated, what does the  
>> response
>> look like?  Is there anything in the target log indicating a  
>> crash?  Can you
>> be more specific about the versions you are using?
>>
>> Best, Adam
>>
>
> Nothing indicated a crash. We have 0.10.0a818506 on a Mac, and
> something very close on an Ubuntu (I'll find the exact version later).
>
> Here's the replication response as well as the interesting logs on the
> target machine. It seems to me that every (not all) partial
> replication process is associated with a corresponding entry in the
> log that says "recording a checkpoint at source update_seq .....".
> (i.e. you can match the recorded_seq number in the replication
> response with the checkpoint update_seq numbers in the log).
>
> {"session_id":"439d41bad454ea5d5dcb16a154800a23","start_time":"Wed, 23
> Sep 2009 18:07:33 GMT","end_time":"Wed, 23 Sep 2009 18:07:53
> GMT","start_last_seq":8663,"end_last_seq":17619,"recorded_seq": 
> 17619,"missing_checked":0,"missing_found":8952,"docs_read": 
> 8952,"docs_written":8952,"doc_write_failures":0}
> {"session_id":"f85e575614479547d70277d24bff2d51","start_time":"Wed, 23
> Sep 2009 18:07:12 GMT","end_time":"Wed, 23 Sep 2009 18:07:17
> GMT","start_last_seq":7710,"end_last_seq":8663,"recorded_seq": 
> 8663,"missing_checked":0,"missing_found":953,"docs_read": 
> 953,"docs_written":953,"doc_write_failures":0}
> {"session_id":"84dc053e810b8a46f19c95ef560d42d5","start_time":"Wed, 23
> Sep 2009 18:06:32 GMT","end_time":"Wed, 23 Sep 2009 18:06:37
> GMT","start_last_seq":7021,"end_last_seq":7710,"recorded_seq": 
> 7710,"missing_checked":0,"missing_found":689,"docs_read": 
> 689,"docs_written":689,"doc_write_failures":0}
> {"session_id":"e72b655988ecc26b85b412fcaf05018a","start_time":"Wed, 23
> Sep 2009 18:05:47 GMT","end_time":"Wed, 23 Sep 2009 18:05:52
> GMT","start_last_seq":5792,"end_last_seq":7021,"recorded_seq": 
> 7021,"missing_checked":0,"missing_found":1229,"docs_read": 
> 1229,"docs_written":1229,"doc_write_failures":0}
> {"session_id":"8fd5d827721e70a28735ad4c3a291c3f","start_time":"Wed, 23
> Sep 2009 18:05:30 GMT","end_time":"Wed, 23 Sep 2009 18:05:35
> GMT","start_last_seq":4875,"end_last_seq":5792,"recorded_seq": 
> 5792,"missing_checked":0,"missing_found":917,"docs_read": 
> 917,"docs_written":917,"doc_write_failures":0}
> {"session_id":"187faed013cb2b63b714aab7845e3f56","start_time":"Wed, 23
> Sep 2009 18:05:02 GMT","end_time":"Wed, 23 Sep 2009 18:05:07
> GMT","start_last_seq":4539,"end_last_seq":4875,"recorded_seq": 
> 4875,"missing_checked":0,"missing_found":336,"docs_read": 
> 336,"docs_written":336,"doc_write_failures":0}
> {"session_id":"e30ee09b3da0dd979d655382bc3dadc8","start_time":"Wed, 23
> Sep 2009 18:04:23 GMT","end_time":"Wed, 23 Sep 2009 18:04:34
> GMT","start_last_seq":1590,"end_last_seq":4539,"recorded_seq": 
> 4539,"missing_checked":0,"missing_found":2949,"docs_read": 
> 2949,"docs_written":2949,"doc_write_failures":0}
> {"session_id":"3486a3b8d8a1e5eee05b82dcf4c66153","start_time":"Wed, 23
> Sep 2009 18:02:17 GMT","end_time":"Wed, 23 Sep 2009 18:02:22
> GMT","start_last_seq":0,"end_last_seq":1590,"recorded_seq": 
> 1590,"missing_checked":0,"missing_found":1590,"docs_read": 
> 1590,"docs_written":1590,"doc_write_failures":0}
>
> [Wed, 23 Sep 2009 18:04:28 GMT] [info] [<0.1959.0>] recording a
> checkpoint at source update_seq 3632
>
> [Wed, 23 Sep 2009 18:04:34 GMT] [info] [<0.1959.0>] recording a
> checkpoint at source update_seq 4539
>
> [Wed, 23 Sep 2009 18:04:41 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> Wed, 23 Sep 2009 18:05:02 GMT] [info] [<0.1941.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.4981.0>
>
> [Wed, 23 Sep 2009 18:05:07 GMT] [info] [<0.4981.0>] recording a
> checkpoint at source update_seq 4875
>
> [Wed, 23 Sep 2009 18:05:17 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> [Wed, 23 Sep 2009 18:05:30 GMT] [info] [<0.1941.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.5376.0>
>
> [Wed, 23 Sep 2009 18:05:35 GMT] [info] [<0.5376.0>] recording a
> checkpoint at source update_seq 5792
>
> Wed, 23 Sep 2009 18:05:43 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> [Wed, 23 Sep 2009 18:05:47 GMT] [info] [<0.1941.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.6322.0>
>
> [Wed, 23 Sep 2009 18:05:52 GMT] [info] [<0.6322.0>] recording a
> checkpoint at source update_seq 7021
>
> [Wed, 23 Sep 2009 18:05:59 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> [Wed, 23 Sep 2009 18:06:32 GMT] [info] [<0.1945.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.7609.0>
>
> [Wed, 23 Sep 2009 18:06:37 GMT] [info] [<0.7609.0>] recording a
> checkpoint at source update_seq 7710
>
> Wed, 23 Sep 2009 18:06:41 GMT] [info] [<0.1945.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> [Wed, 23 Sep 2009 18:07:12 GMT] [info] [<0.7608.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.8369.0>
>
> [Wed, 23 Sep 2009 18:07:17 GMT] [info] [<0.8369.0>] recording a
> checkpoint at source update_seq 8663
>
> [Wed, 23 Sep 2009 18:07:20 GMT] [info] [<0.7608.0>] 127.0.0.1 - -
> 'POST' /_replicate 200
>
> [Wed, 23 Sep 2009 18:07:23 GMT] [info] [<0.7608.0>] 127.0.0.1 - -
> 'GET' /_utils/image/delete-mini.png 304
>
> [Wed, 23 Sep 2009 18:07:33 GMT] [info] [<0.7608.0>] starting
> replication "9577548b0faafa46430af6d8b2898a47" at <0.9376.0>
>
> [Wed, 23 Sep 2009 18:07:38 GMT] [info] [<0.9376.0>] recording a
> checkpoint at source update_seq 10821
>
> [Wed, 23 Sep 2009 18:07:44 GMT] [info] [<0.9376.0>] recording a
> checkpoint at source update_seq 13507
>
> [Wed, 23 Sep 2009 18:07:50 GMT] [info] [<0.9376.0>] recording a
> checkpoint at source update_seq 16222
>
> [Wed, 23 Sep 2009 18:07:53 GMT] [info] [<0.9376.0>] recording a
> checkpoint at source update_seq 17619

Hmm, I must admit I'm stumped so far.  Are you by any chance building  
from SVN repeatedly and installing into the same prefix?  Please feel  
free to file a ticket in JIRA[1] so we don't forget about this.  You  
might try again with the log level on the target set to debug,  
although I'm not certain it will tell us anything.  I'll see if I can  
find a way to reproduce this.  Best,

Adam

[1]: https://issues.apache.org/jira/browse/COUCHDB


Mime
View raw message