incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Replication hangs
Date Mon, 19 Oct 2009 13:53:09 GMT
On Mon, Oct 19, 2009 at 9:48 AM, Simon Eisenmann <simon@struktur.de> wrote:
> Hi Paul,
>
> thanks for your feedback!
>
> Am Montag, den 19.10.2009, 09:40 -0400 schrieb Paul Davis:
>> Are there any tracebacks in the logs that you can paste? I don't think
>> I've heard of replication getting wedged without some sort of
>> feedback.
>
> Unfortunately there was no error in the logs or on stderr. Also any
> further replication request does hang as well (never completes). The
> last entry is always "recording a checkpoint at source update_seq ...".
>
> Please note that this is reproduceable, means it happens all the time
> though the time frame varies.
>
>> Also, are you using continuous replication then? I do know that just
>> before the 0.10.0 release that Adam Kocoloski and Robert Newson spent
>> a good amount of time getting star (all nodes replicate continuously
>> to all otheres) kinks ironed out. Or maybe it was a ring. I dunno, but
>> there was work on something like that.
>
> I am not using continous replication but an update notification process
> triggering pull replication on the other nodes from the database which
> was changes. Your point regarding rings is interesting. In general that
> would explain it. Though in case of a ring i would have multiple hanging
> replications at the same time correct? It always starts with one
> direction hanging. The other way around usually works just fine until it
> hangs some time (hours) later.
>
> Also i have tested this with a couple of SVN revisions before the 10.0
> release and things improved a lot since the first tests. Though now i
> have much more data database update sequence in millions range.
>
> Best regards
> Simon
>
>
>
>>
>> Paul Davis
> --
> Simon Eisenmann
>
> [ mailto:simon@struktur.de ]
>
> [ struktur AG | Kronenstra├če 22a | D-70173 Stuttgart ]
> [ T. +49.711.896656.68 | F.+49.711.89665610 ]
> [ http://www.struktur.de | mailto:info@struktur.de ]
>

Simon,

Hmmm, that sounds most odd. Are there any consistencies on when it
hangs? Specifically, does it look like its a poison doc that causes
things to go wonky or some such? Do nodes fail in a specific order?

Also, you might try setting up the continuous replication instead of
the update notifications as that might be a bit more ironed out.

Another thing to check is if its just the task status that's wonky vs
actual replication. You can check the _local doc that's created by
replication to see if its update seq is changing while task statuses
aren't.

Paul Davis

Mime
View raw message