activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric-AWL <eric.vinc...@atosorigin.com>
Subject Re: CountDownLatch problem . MultiCast Discovery and refusal of connection on DUPLEX connection (Fuse 5.3.0.5) .
Date Wed, 07 Jul 2010 16:04:04 GMT

We put a no-Duplex Configuration instead of a Duplex Configuration and it
seemed to work better.... But
today during a network problem (alternatively on/off) our process doesn't
resist ....

We have 
- a thread dump which shows 85 StartLocalBridge Threads waiting for the same
latch into the DemandForwardingBridgeSupport.StartLocalBridge method :

 protected void startLocalBridge() throws Exception {
        if (localBridgeStarted.compareAndSet(false, true)) {
            synchronized (this) {
                if (LOG.isTraceEnabled()) {
                    LOG.trace(configuration.getBrokerName() + " starting
local Bridge, localBroker=" + localBroker);
                }
                remoteBrokerNameKnownLatch.await();
                ...
}

- 960 CLOSE_WAIT
- a file descriptor limit

Will the transport.closeAsync=false flag be helpful here ?

Eric-AWL



Gary Tully wrote:
> 
> Hi, as you can see, this is a complicated area of the code. The best
> approach is to try and produce a test case for your scenario. Take a
> look at the test: BrokerQueueNetworkWithDisconnectTest in
> activemq-core. This can simulate network failures and can use
> multicast (bridgeAllBrokers). Getting a reproducible test case is the
> best way to validate your changes and protect them into the future.
> 
> The only other alternative is to keep adding your suggestions to the
> jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and
> with a bit of luck I (or some one else) will have a change to look at
> it before 5.4 .
> 
> 
> On 6 July 2010 12:37, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>>
>> I wonder if it could not have some undesirable effects on both side of
>> the
>> duplex connection ....
>>
>> perhaps we should test the started AtomicBoolean, in the start() method
>> after the corresponding "await" and shouldn't execute the end of the
>> start
>> method ?
>>
>>            if (configuration.isDuplex() && duplexInitiatingConnection
==
>> null) {
>>                // initiator side of duplex network
>>                remoteBrokerNameKnownLatch.await();
>>            }
>>
>> HERE ??? (if started.get()) { ???
>>
>>            try {
>>                triggerRemoteStartBridge();
>>            } catch (IOException e) {
>>                LOG.warn("Caught exception from remote start", e);
>>            }
>>            NetworkBridgeListener l = this.networkBridgeListener;
>>            if (l != null) {
>>                l.onStart(this);
>>            }
>>
>> It's the first big problem I have with ActiveMQ complex configuration, it
>> happens when network is faulty (that happens not very often), and I don't
>> know ActiveMQ source code very well ....
>>
>> Who could help me to identify potential effects of this change, before I
>> try
>> to modify it ? (I can't do that on my production system without some
>> tests
>> and expert validation)
>>
>> Eric-AWL
>>
>>
>> Gary Tully wrote:
>>>
>>> that seems reasonable. want to submit a patch against trunk?
>>>
>>> On 6 July 2010 12:10, Eric-AWL <eric.vincent@atosorigin.com> wrote:
>>>>
>>>> What could happen if we add
>>>>
>>>>         if (configuration.isDuplex() && duplexInitiatingConnection
==
>>>> null)
>>>> {
>>>>                // initiator side of duplex network
>>>>                remoteBrokerNameKnownLatch.countDown();
>>>>            }
>>>>
>>>> into the stop() method of DemandForwardingBridgeSupport class ?
>>>>
>>>> Eric-AWL
>>>>
>>>>
>>>> Eric-AWL wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I'm sure that I identified a Latch problem in Multicast Network
>>>>> Discovery
>>>>> mechanism on Duplex connection
>>>>>
>>>>> The multicast notifier thread is blocked. here the trace
>>>>>
>>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>>>>> [0x0000000044c26000..0x0000000044c26b90]
>>>>>    java.lang.Thread.State: WAITING (parking)
>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>>>>> java.util.concurrent.CountDownLatch$Sync)
>>>>>       at
>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>>>>       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>>>>       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>>>>       at
>>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>>>>       at
>>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>>>>       at
>>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>>>>       at
>>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>>>>       at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>       at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>>
>>>>> The problem appears when the network is quickly and alternatively
>>>>> on/off
>>>>> between the two components.
>>>>> The bridge is created in one direction, but the answer can not be
>>>>> received.
>>>>>
>>>>> The thread is blocked on the CountDownLatch. Even if multicast frames
>>>>> are
>>>>> received, the component can not establish a new network connection.
>>>>>
>>>>> Here are an corresponding activemq trace
>>>>>
>>>>> When it is OK :
>>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO
>>>>>  DiscoveryNetworkConnector
>>>>> - Establishing network connection from
>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge
>>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160
>>>>> and
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>>> has been established.
>>>>>
>>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>>
>>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO
>>>>>  DiscoveryNetworkConnector
>>>>> - Establishing network connection from
>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge
>>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162
>>>>> and
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>>> has been established.
>>>>>
>>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge
>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>>
>>>>> When it is KO : "Unknown"
>>>>>
>>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO
>>>>>  DiscoveryNetworkConnector
>>>>> - Establishing network connection from
>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge
>>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164
>>>>> and
>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due
>>>>> to
>>>>> a
>>>>> remote error: java.net.SocketException: Connection reset
>>>>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge
>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>>>>
>>>>>
>>>>> Here is the other side corresponding activemq trace
>>>>>
>>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>>
>>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>>> to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>
>>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>>>>> DemandForwardingBridge         - Network connection between
>>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>>> established.
>>>>>
>>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>>> to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>
>>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>>>>> DemandForwardingBridge         - Network connection between
>>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>>> established.
>>>>>
>>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge
>>>>> to
>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>
>>>>>
>>>>> Eric-AWL
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> http://blog.garytully.com
>>>
>>> Open Source Integration
>>> http://fusesource.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> http://blog.garytully.com
> 
> Open Source Integration
> http://fusesource.com
> 
> 

-- 
View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29098137.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Mime
View raw message