activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (AMQ-2774) Network of brokers : Multicast discovery stopped to work
Date Thu, 22 Jul 2010 13:39:53 GMT

    [ https://issues.apache.org/activemq/browse/AMQ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=60843#action_60843
] 

Eric edited comment on AMQ-2774 at 7/22/10 9:38 AM:
----------------------------------------------------

I propose you this patch in the tar file (for ActiveMQ 5.3.2 version)

2 modifications are brought :
- the RemoteBrokerNameKnownLatch is countdown at the end of the stop() to free the connector
thread even if it doesn't receive the correct remote broker info. That can appear when network
is successively and quickly on/off. In the case of a DUPLEX connection, the network connector
was then totally blocked. In the case of a not duplex connection, some dead threads were up.
- the second modification tries to stop an old invalid duplex Transport Connection when a
new Duplex Transport Connection is required by the same broker for the same transport Connector

I tried to realize a JUNIT test which simulates a lot of "close socket" even during bridge
start process. It seems to work, but I didn't succeed in simulating the second modification.

the JUNIT test include a "tcpfaulty" transport with two socket factories. In the future I
will try to put SocketProxy code into the ServerSocketTstFactory.

This situation where a second duplex connection tried to be established before the first one
was clearly dead, appeared in my own (not JUNIT) tests , when a brief networkl fault occured
and when only the first modification was implemented.

All modifications are clearly indicated by 
// Eric-AWL AMW-2774 Beginning
// Eric-AWL AMW-2774 End

Sorry for the poor english. (I'm french)

Eric-AWL


      was (Author: eric-awl):
    I propose you this patch in the tar file (for ActiveMQ 5.3.2 version)

2 modifications are brought :
- the RemoteBrokerNameKnownLatch is countdown at the end of the stop() to free the connector
thread even if it doesn't receive the correct remote broker info. That can appear when network
if successively and quickly on/off. In the case of a DUPLEX connection, the network connector
was then totally blocked. In the case of a not duplex connection, some dead threads were up.
- the second modification tries to stop an old invalid duplex Transport Connection when a
new Duplex Transport Connection is required by the same broker for the same transport Connector

I tried to realize a JUNIT test which simulates a lot of "close socket" even during bridge
start process. It seems to work, but I didn't succeed in simulating the second modification.

the JUNIT test include a "tcpfaulty" transport with two socket factories. In the future I
will try to put SocketProxy code into the ServerSocketTstFactory.

This situation where a second duplex connection tried to be established before the first one
was clearly dead, appeared in my own (not JUNIT) tests , when a brief networkl fault occured
and when only the first modification was implemented.

All modifications are clearly indicated by 
// Eric-AWL AMW-2774 Beginning
// Eric-AWL AMW-2774 End

Sorry for the poor english. (I'm french)

Eric-AWL

  
> Network of brokers : Multicast discovery stopped to work
> --------------------------------------------------------
>
>                 Key: AMQ-2774
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2774
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.2.0
>         Environment: Linux
>            Reporter: Eric
>             Fix For: 5.4.1
>
>         Attachments: AMQ2774.tar, JMAC-BEA-lastlog.log-20100315
>
>
> Hi everybody
> I experiment a big problem with the multicast discovery algorithm, in a network of brokers
topology.
> In some conditions, a broker can't reestablish a distant connection even if the distant
broker is restarted.
> I have the log traces that would help to identify the origin of the problem.
> When there is no discovery/connection error, I can see these 2 lines in the activemq
log file
> #08 Jun 2010 14:31:30,639  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> #08 Jun 2010 14:31:30,692  INFO  [StartLocalBridge: localBroker=vm://ACCLU-tpnocp04v#26]
org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#26 and tcp://tpnocp09v-bus/10.18.126.28:13100(MOM-tpnocp09v)
has been established.
> When the connection is broken, I can see this line in the log.
> #11 Jun 2010 12:37:32,585  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DemandForwardingBridge
> ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped
> Then the current ACCLU-tpnocp04v broker tries to reestablish the connection :
> #11 Jun 2010 12:37:34,475  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> But, here, the second line of the log ("has been established") doesn't appear in the
log file !! I don't know exactly if the connection is up or not.
> Then the connection is broken again (look at "Unknown" instead of "MOM-tpnocp09v".
> #11 Jun 2010 13:33:58,655  WARN  [ActiveMQ Transport: tcp://tpnocp09v-bus/10.18.126.28:13100]
org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#58 and tcp://tpnocp09v-bus/10.18.126.28:13100
shutdown due to a remote error: java.net.SocketException: Connection reset
> #11 Jun 2010 13:33:58,657  INFO  [NetworkBridge] org.apache.activemq.network.DemandForwardingBridge^M
> ACCLU-tpnocp04v bridge to Unknown stopped
> And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line (Establishing/Has
been established) appears, and no network connection is reestablished between ACCLU-tpnocp04v
and MOM-tpnocp09v. it seems that this ACCLU-tpnocp04v broker can no longer establish a connection
with the MOM-tpnocp09v broker !!!
> The production teams tell me that this problem seems not to be resolved in fuse-5.3.0.6
version.
> Eric-AWL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message