db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Øystein Grøvlen (JIRA) <j...@apache.org>
Subject [jira] Commented: (DERBY-3527) The slave will not notice that a network cable is unplugged and will therefore reject failover/stopSlave commands
Date Thu, 27 Mar 2008 12:57:25 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582633#action_12582633
] 

Øystein Grøvlen commented on DERBY-3527:
----------------------------------------

Thanks for the new patch, Jørgen.  I think the structure of the code
is looking very good now.  I still have some comments, though.

 - ReplicationMessageTransmit#sendMessageWaitForReply:
   The block synchronized on receiveSemaphore could be made smaller.
   Since the method is synchronized so that only one thread can be
   executing it at a time, I do not think you need to synchronize the
   setting of receivedMsg to null.  It is not read by other methods,
   so as long as it is done before sending the message, I do not see
   any potential conflicts with the MasterReceiverThread. 

 - As it is now, if a the wait times out, I do not think you will be
   able to recover the connection.  Another call on
   sendMessageWaitForReply will result in two outstanding replies, and
   that is not currently taken into account.  It may be sufficient to
   just make sure that all users handles the situation correctly when
   this happens, but I think it would be more safe to take down the
   connection when this happens.

 - I think there will be a null pointer exception if the slave does
   not reply in time during initialization of the connection.
   brokerConnection/VeryMessageType is not prepared to handle that
   sendMessageWaitForReply returns null.

 - ReplicationMessageTransmit#tearDown: I think socketConn needs to be
   set to null in order for checkSocketConnection to serve its
   purpose.

 - ReplicationMessageReceive:
   pingSemaphore is really used for two purposes: 
     1. To wait until it is time to send a ping message
     2. To wait until a pong message has arrived.

   When a pong message arrives and notify() is called, there will most
   likely be two threads waiting, the thread waiting for the pong,
   and the pingThread waiting for the next time to send ping.  It is
   not deterministic which thread will be notified.  Hence, there is a
   risk that more than one ping is sent per request.

   I think this can be fixed by either using different monitors or by
   not sending a ping if a valid reply has been received (i.e.,
   connectionConfirmed == true).  I think the latter alternative also
   requires that notifyAll() is called when a pong message arrives.

 - ReplicationMessageReceive#readMessage:
   I think this method needs to handle that two pongs arrive in
   sequence.  The way it is now, the second pong will be returned to
   the caller.

 - connectionConfirmed seems to be protected by pingSemaphore.  Hence,
   I do not think it needs to be volatile.

 - SlavePingThread#run, comment at the end:
   I guess you mean isConnectedToMaster.


> The slave will not notice that a network cable is unplugged and will therefore reject
failover/stopSlave commands
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3527
>                 URL: https://issues.apache.org/jira/browse/DERBY-3527
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.0.0, 10.5.0.0
>            Reporter: Jørgen Løland
>            Assignee: Jørgen Løland
>         Attachments: derby-3527-1a.diff, derby-3527-1a.stat, derby-3527-1b.diff, derby-3527-1b.stat
>
>
> If a network cable between the master and slave is unplugged (or a switch crashes etc),
ObjectInputStream#readObject will not get an exception. Neither the socket nor the input stream
can be queried for information on whether or not the connection is working. AFAIK, the only
way to find out if the network is down is to send a message.
> The slave commands stopSlave and failover are rejected if the network connection is working.
To be absolutely sure that the connection is working, we need to ping the master when these
commands are requested.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message