activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Bosanac <de...@nighttale.net>
Subject Re: FailoverTransport stops working after a while
Date Mon, 06 Apr 2009 08:57:46 GMT
Hi Norbert,

this sounds like a different problem that this one. Take a look at
http://issues.apache.org/activemq/browse/AMQ-2149 which is being worked on
and give 5.3-SNAPSHOT a try.

Cheers
--
Dejan Bosanac

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net


On Mon, Apr 6, 2009 at 9:06 AM, Norbert Pfistner <
norbert.pfistner@picturesafe.de> wrote:

> Hallo Murty,
>
> We also experience the same problems when using failover: Sometimes clients
> stop working after a slave became a master and processing a bunch of
> messages with this new master.
> And yes, we use 5.1 . We also did some testing with 5.2, unfortunately with
> the same result. So it looks like 5.2 is suffering from the same bug.
> Actually we do not use failover in our productive environment due to this
> unreliable feature.
>
> Would be fine when this bug is fixed.
>
> Greetings,
> Norbert
>
>
> Murty Dasari schrieb:
>
>  Thanks Dejan for the reply.
>>
>> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
>> issue before I try pushing the new version to our servers (that is little
>> lengthy process). I looked at the 5.2 source code and I suspect the
>> problem
>> is still there.
>>
>> I'm surprised to see that others are not running into any issues with it,
>> may be there is something wrong with my topology and setup. Does the
>> following setup look right?
>>
>> 1. We have a bunch of applications posting messages to a local
>> (localhost) AMQ. (We have several boxes like this)
>> 2. We setup a camel route to delivery the messages to a central AMQ host
>> with durable subscription. (There is only one box like this)
>>
>> ----------------------------------------------------------------
>>  <camelContext>
>>  <route>
>>            <from
>>
>> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>>            <to uri="CENTRALMQ:topic:Topic1"/>
>>  </route>
>> ...... Few other routes
>>    </camelContext>
>>
>>    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>>        <property name="connectionFactory">
>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>               <property name="brokerURL"
>> value="vm://LOCALMQ?broker.persistent=false" />
>>            </bean>
>>        </property>
>>    </bean>
>>    <bean id="CENTRALMQ"
>> class="org.apache.camel.component.jms.JmsComponent">
>>        <property name="connectionFactory">
>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>               <property name="brokerURL" value="failover://(tcp://
>> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100"
>> />
>>            </bean>
>>        </property>
>>    </bean>
>> -----------------------------------------
>>
>> The main change compared with other config I saw is, we are using failover
>> with two end points that are same, basically with this model we were able
>> to
>> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
>> problems. We need retries but not really failover (i.e, send to secondary
>> if
>> primary were down), as messages would still be there in LOCALMQ if there
>> were some connectivity problems.
>>
>> Is there any other way to achieve retries without using "failover
>> transport"?
>>
>> thanks for your time.
>>
>> cheers
>> - mdasari
>>
>> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <dejan@nighttale.net>
>> wrote:
>>
>>  Hi,
>>>
>>> did you try 5.2.0 version? Probably some of those issues are already
>>> addressed.
>>>
>>> Cheers
>>> --
>>> Dejan Bosanac
>>>
>>> Open Source Integration - http://fusesource.com/
>>> ActiveMQ in Action - http://www.manning.com/snyder/
>>> Blog - http://www.nighttale.net
>>>
>>>
>>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <mdasari@gmail.com> wrote:
>>>
>>>  Hi,
>>>>
>>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
>>>> servers) after a while the AMQ failover transport stops working thus
>>>>
>>> making
>>>
>>>> messages to be not delivered. (from a producer AMQ server box to a
>>>>
>>> central
>>>
>>>> consumer AMQ server box through camel)
>>>>
>>>> --------------------------------------------------------------
>>>> The following is the data from our log files:
>>>> --------------------------------------------------------------
>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
>>>> - Connection established
>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>> - Executing callback on JMS Session: ActiveMQSession
>>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>
>>> ActiveMQTextMessage
>>>
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>> - Sending created message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
>>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
>>>> ActiveMQTextMessage
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
>>>> - Stopped.
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint done.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>> MessageDispatch
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
>>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Attempting connect to: tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
>>>> CacheEnabled=true, SizePrefixDisabled=false,
>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - Received WireFormat: WireFormatInfo { version=3,
>>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - tcp:///10.87.129.196:61616 before negotiation:
>>>>
>>> OpenWireFormat{version=3,
>>>
>>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
>>>> sizePrefixDisabled=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - tcp:///10.87.129.196:61616 after negotiation:
>>>>
>>> OpenWireFormat{version=3,
>>>
>>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
>>>> sizePrefixDisabled=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Connection established
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>> - Executing callback on JMS Session: ActiveMQSession
>>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>
>>> ActiveMQTextMessage
>>>
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>> - Sending created message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
>>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
>>>> ActiveMQTextMessage
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Stopped.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>> MessageDispatch
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
>>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waiting 10 ms before attempting connection.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
>>>> Failover Worker: 1889455" java.lang.NullPointerException
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint done.
>>>> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> ---------------------------------------------
>>>>
>>>>
>>>> Basically, it was able to deliver a message (and few more prior to that
>>>> time
>>>> period), but for another message that is very close (in time) to the
>>>> previous message it is running into a NullPointerException, after that
>>>> it
>>>> stops functioning totally.
>>>>
>>>> I took a brief look at the FailoverTransport.java code, I'm not an
>>>> expert
>>>> on
>>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
>>>> member
>>>> variable is attempted to be used by the task-runner thread before it was
>>>> completely initialized  (basically race conditions without proper
>>>> synchronization)
>>>>
>>>> I can provide more details on our network topology if it is required. I
>>>> searched around but didn't find any related issues or bugs. Does anyone
>>>> know
>>>> if this is a known issue, and which version this is going to be
>>>>
>>> addressed.
>>>
>>>> If not I'll open a JIRA.
>>>>
>>>> Appreciate your help.
>>>>
>>>> cheers
>>>> - mdasari
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>>
>>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
>>>
>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message