Am Mittwoch, den 21.10.2009, 09:08 -0400 schrieb Adam Kocoloski:
> > Though in the logs i now see lots of
> >
> > [info] [<0.164.0>] A server has restarted sinced replication
> start.
> > Not
> > recording the new sequence number to ensure the replication is
> redone
> > and documents reexamined.
> >
> > Messages. I posted this in IRC yesterday and was told that this is
> > nothing to worry about. So what exactly does it mean and why it is
> > logged with info level when it can be ignored?
> >
> > If that message is nothing critical i would suggest to log it with
> > debug
> > level, as it is shown at any replication checkpoint on any node as
> > soon
> > as one of the other nodes was offline.
>
> So, what we're trying to do here is avoid skipping updates from the
> source server. Consider the following sequence of events:
>
> 1) Save some docs to the source with delayed_commits=true
> 2) Replicating source -> target
> 3) Restart source before full commit, losing the updates that have
> replicated
> 4) Save more docs to source, overwriting previously used sequence
> numbers
>
> If that happens, we don't want the replicator to skip the new docs
> that have been saved in step 4. So if we detect that a server
> restarted, we play it safe and don't checkpoint, so that the next
> replication will re-examine the sequence. An analogous situation
> could happen with the target losing updates that the replicator had
> written (but not fully committed).
>
> Skipping checkpointing altogether for the remainder of the
> replication
> is an overly conservative position. In my opinion what we should do
> when we detect this condition is restart the replication immediately
> from the last known checkpoint. Then you'd see one of these [info]
> level messages telling you that the replicator is going to restart
> to
> double-check some sequence numbers, and that's it.
Ok. Understood. Thanks for the explanation. If that behaviour would only
execute once i would be absolutely fine. But with the current
implementation this is done forever and replication never seems to
switch to normal mode again.
Best regards
Simon
--
Simon Eisenmann
[ mailto:simon@struktur.de ]
[ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
[ T. +49.711.896656.68 | F.+49.711.89665610 ]
[ http://www.struktur.de | mailto:info@struktur.de ]
|