activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric (JIRA)" <>
Subject [jira] Commented: (AMQ-2774) Network of brokers : Multicast discovery stopped to work
Date Mon, 26 Jul 2010 12:52:53 GMT


Eric commented on AMQ-2774:

Hi Gary

It's very difficult to simulate quick network faults. With my JUNIT test, I simulate close()
immediately or some seconds later (with a random value). When the close() is done immediatly,
I succeeded  in  validating DUPLEX network of brokers and that nothing was blocked in this
situation with my patch :

2010-07-26 14:09:20,001 [ce[SpokeBroker]] INFO  DiscoveryNetworkConnector      - Establishing
network connection from vm://SpokeBroker to tcpfaulty://localhost.localdomain:61617
2010-07-26 14:09:20,035 [ocalport=32972]] INFO  SocketTstFactory               - Trying to
close client socket Socket[addr=localhost.localdomain/,port=61617,localport=32972]
2010-07-26 14:09:20,036 [ocalport=32972]] INFO  SocketTstFactory               - Client socket
Socket[addr=localhost.localdomain/,port=61617,localport=32972] is closed.
2010-07-26 14:09:20,037 [] WARN  DemandForwardingBridge         - Network connection
between vm://SpokeBroker#8 and tcpfaulty://localhost.localdomain/ shutdown
due to a remote error: Socket closed
2010-07-26 14:09:20,038 [NetworkBridge  ] INFO  DemandForwardingBridge         - SpokeBroker
bridge to Unknown stopped

In this kind of situation (bridge to Unknown stopped), I experimented on 5.3.0-05 fuse production
environment, that the network of connector thread was completely blocked on the latch, with
Duplex connections.

I'm not sure that my JUNIT test demonstrates the problem on 5.3.0-05. It helped me to debug
my own patch.

I don't try my JUNIT test on 5.3.0-5 fuse version. I'm going to verify that my JUNIT test
sometimes shows the problem with the 5.3.0-5 core jar.

I can look at 5.4-snapshot source code to see if something is already changed about this latch
on the trunk.

I will tell you my results.


> Network of brokers : Multicast discovery stopped to work
> --------------------------------------------------------
>                 Key: AMQ-2774
>                 URL:
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.2.0
>         Environment: Linux
>            Reporter: Eric
>            Assignee: Gary Tully
>             Fix For: 5.4.1
>         Attachments: AMQ2774.tar, JMAC-BEA-lastlog.log-20100315
> Hi everybody
> I experiment a big problem with the multicast discovery algorithm, in a network of brokers
> In some conditions, a broker can't reestablish a distant connection even if the distant
broker is restarted.
> I have the log traces that would help to identify the origin of the problem.
> When there is no discovery/connection error, I can see these 2 lines in the activemq
log file
> #08 Jun 2010 14:31:30,639  INFO  [Multicast Discovery Agent Notifier]
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> #08 Jun 2010 14:31:30,692  INFO  [StartLocalBridge: localBroker=vm://ACCLU-tpnocp04v#26]
> Network connection between vm://ACCLU-tpnocp04v#26 and tcp://tpnocp09v-bus/
has been established.
> When the connection is broken, I can see this line in the log.
> #11 Jun 2010 12:37:32,585  INFO  [Multicast Discovery Agent Notifier]
> ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped
> Then the current ACCLU-tpnocp04v broker tries to reestablish the connection :
> #11 Jun 2010 12:37:34,475  INFO  [Multicast Discovery Agent Notifier]
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> But, here, the second line of the log ("has been established") doesn't appear in the
log file !! I don't know exactly if the connection is up or not.
> Then the connection is broken again (look at "Unknown" instead of "MOM-tpnocp09v".
> #11 Jun 2010 13:33:58,655  WARN  [ActiveMQ Transport: tcp://tpnocp09v-bus/]
> Network connection between vm://ACCLU-tpnocp04v#58 and tcp://tpnocp09v-bus/
shutdown due to a remote error: Connection reset
> #11 Jun 2010 13:33:58,657  INFO  [NetworkBridge]^M
> ACCLU-tpnocp04v bridge to Unknown stopped
> And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line (Establishing/Has
been established) appears, and no network connection is reestablished between ACCLU-tpnocp04v
and MOM-tpnocp09v. it seems that this ACCLU-tpnocp04v broker can no longer establish a connection
with the MOM-tpnocp09v broker !!!
> The production teams tell me that this problem seems not to be resolved in fuse-
> Eric-AWL

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message