db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <knut.hat...@oracle.com>
Subject Re: Replication Master stop, but Slave still alive
Date Wed, 12 Jun 2013 08:11:02 GMT
Dag Wanvik <dag.wanvik@oracle.com> writes:

> On 11.06.2013 18:50, benrahman wrote:
>
>     /Master derby.log/
>     
>     ----  BEGIN REPLICATION ERROR MESSAGE (6/5/13 3:35 PM) ----
>     Exception occurred during log shipping.
>     java.net.SocketException: Connection reset by peer: socket write error
>             at java.net.SocketOutputStream.socketWrite0(Native Method)
>             at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>             at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>
> Looks like the socket the master uses to ship records to slave stopped working; hard
to say what's the issue here. Do you see anything
> in the slave's log file at this time instant?
>
> Later replication error messages in the master's log file show that the buffer grows
full (since it can't send):
>
>> ----  BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ----
>> Exception occurred during log shipping.
>> org.apache.derby.impl.store.replication.buffer.LogBufferFullException
>>       at
>> org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(Unknown
>
> Not sure why the slave doesn't fail over; maybe the master process needs to be stopped
(crash) before it will happen..
> It is probably right that it doesn't happen when you first see the socket write error;
it could be due to a intermittent network error.

That's right. It is supposed to try to reconnect until there's no more
space in the replication log buffers, according to
http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailures.html.

> But I believe the slave and master have a keep-alive protocol to enable the slave to
fail over when the master is not longer seen to be
> alive.

I think the slave never fails over automatically, even if it detects
that it has lost contact with the master. It has to be told to do so.
See http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailover.html,
which says:

  There is no automatic failover or restart of replication after one of
  the instances has failed.


-- 
Knut Anders

Mime
View raw message