Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 13428 invoked from network); 13 Jun 2010 07:33:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Jun 2010 07:33:19 -0000 Received: (qmail 43672 invoked by uid 500); 13 Jun 2010 07:33:19 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 43471 invoked by uid 500); 13 Jun 2010 07:33:15 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 43463 invoked by uid 99); 13 Jun 2010 07:33:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Jun 2010 07:33:14 +0000 X-ASF-Spam-Status: No, hits=-1516.9 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Jun 2010 07:33:13 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o5D7WrVZ010633 for ; Sun, 13 Jun 2010 07:32:53 GMT Message-ID: <32598438.13821276414373270.JavaMail.jira@thor> Date: Sun, 13 Jun 2010 03:32:53 -0400 (EDT) From: "Eric (JIRA)" To: dev@activemq.apache.org Subject: [jira] Commented: (AMQ-2774) Network of brokers : Multicast discovery stopped to work In-Reply-To: <32477170.13361276270251772.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: ae95407df07c98740808b2ef9da0087c [ https://issues.apache.org/activemq/browse/AMQ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=60035#action_60035 ] Eric commented on AMQ-2774: --------------------------- Hi I looked at MulticastDiscoveryAgent.java source. The "doRecovery" method is the only one that tells the connection must be reestablished. The indicator "failed" is put to false as soon as the recovery conditions are filled and even if the connection is not really reestablished (that seems to be my case) . As soon as "failed" indicator is false, the recovery and the reconnection (fireServiceAddEvent) can't be retried. Only "serviceFailed" method can put "failed" to true again. Just by looking to this java source, it seems that a situation where the connection is not re-established, the indicator "failed" is false, the multicast heartbeat frames are received, is possible .... Only the call to "serviceFailed" can put the failed indicator to true (or delete the object entry in the hashmap). Is it possible that serviceFailed is not called although the connection is down ? (I think that it should be safer that "failed" be an atomic boolean since I believe serviceFailed method could be called by another thread) Eric-AWL > Network of brokers : Multicast discovery stopped to work > -------------------------------------------------------- > > Key: AMQ-2774 > URL: https://issues.apache.org/activemq/browse/AMQ-2774 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.2.0 > Environment: Linux > Reporter: Eric > > Hi everybody > I experiment a big problem with the multicast discovery algorithm, in a network of brokers topology. > In some conditions, a broker can't reestablish a distant connection even if the distant broker is restarted. > I have the log traces that would help to identify the origin of the problem. > When there is no discovery/connection error, I can see these 2 lines in the activemq log file > #08 Jun 2010 14:31:30,639 INFO [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector > Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false > #08 Jun 2010 14:31:30,692 INFO [StartLocalBridge: localBroker=vm://ACCLU-tpnocp04v#26] org.apache.activemq.network.DemandForwardingBridge > Network connection between vm://ACCLU-tpnocp04v#26 and tcp://tpnocp09v-bus/10.18.126.28:13100(MOM-tpnocp09v) has been established. > When the connection is broken, I can see this line in the log. > #11 Jun 2010 12:37:32,585 INFO [Multicast Discovery Agent Notifier] org.apache.activemq.network.DemandForwardingBridge > ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped > Then the current ACCLU-tpnocp04v broker tries to reestablish the connection : > #11 Jun 2010 12:37:34,475 INFO [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector > Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false > But, here, the second line of the log ("has been established") doesn't appear in the log file !! I don't know exactly if the connection is up or not. > Then the connection is broken again (look at "Unknown" instead of "MOM-tpnocp09v". > #11 Jun 2010 13:33:58,655 WARN [ActiveMQ Transport: tcp://tpnocp09v-bus/10.18.126.28:13100] org.apache.activemq.network.DemandForwardingBridge > Network connection between vm://ACCLU-tpnocp04v#58 and tcp://tpnocp09v-bus/10.18.126.28:13100 shutdown due to a remote error: java.net.SocketException: Connection reset > #11 Jun 2010 13:33:58,657 INFO [NetworkBridge] org.apache.activemq.network.DemandForwardingBridge^M > ACCLU-tpnocp04v bridge to Unknown stopped > And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line (Establishing/Has been established) appears, and no network connection is reestablished between ACCLU-tpnocp04v and MOM-tpnocp09v. it seems that this ACCLU-tpnocp04v broker can no longer establish a connection with the MOM-tpnocp09v broker !!! > The production teams tell me that this problem seems not to be resolved in fuse-5.3.0.6 version. > Eric-AWL -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.