activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norbert Pfistner <npfist...@picturesafe.de>
Subject Re: FailoverTransport stops working after a while
Date Mon, 06 Apr 2009 09:52:14 GMT
Hi Dejan,

we use JDBC based persistant messages. But we'll give 5.3-SNAPSHOT a try 
anyway.

Thank's for your hint.

Greetings,
Norbert

Dejan Bosanac schrieb:
> Hi Norbert,
> 
> this sounds like a different problem that this one. Take a look at
> http://issues.apache.org/activemq/browse/AMQ-2149 which is being worked on
> and give 5.3-SNAPSHOT a try.
> 
> Cheers
> --
> Dejan Bosanac
> 
> Open Source Integration - http://fusesource.com/
> ActiveMQ in Action - http://www.manning.com/snyder/
> Blog - http://www.nighttale.net
> 
> 
> On Mon, Apr 6, 2009 at 9:06 AM, Norbert Pfistner <
> norbert.pfistner@picturesafe.de> wrote:
> 
>> Hallo Murty,
>>
>> We also experience the same problems when using failover: Sometimes clients
>> stop working after a slave became a master and processing a bunch of
>> messages with this new master.
>> And yes, we use 5.1 . We also did some testing with 5.2, unfortunately with
>> the same result. So it looks like 5.2 is suffering from the same bug.
>> Actually we do not use failover in our productive environment due to this
>> unreliable feature.
>>
>> Would be fine when this bug is fixed.
>>
>> Greetings,
>> Norbert
>>
>>
>> Murty Dasari schrieb:
>>
>>  Thanks Dejan for the reply.
>>> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
>>> issue before I try pushing the new version to our servers (that is little
>>> lengthy process). I looked at the 5.2 source code and I suspect the
>>> problem
>>> is still there.
>>>
>>> I'm surprised to see that others are not running into any issues with it,
>>> may be there is something wrong with my topology and setup. Does the
>>> following setup look right?
>>>
>>> 1. We have a bunch of applications posting messages to a local
>>> (localhost) AMQ. (We have several boxes like this)
>>> 2. We setup a camel route to delivery the messages to a central AMQ host
>>> with durable subscription. (There is only one box like this)
>>>
>>> ----------------------------------------------------------------
>>>  <camelContext>
>>>  <route>
>>>            <from
>>>
>>> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>>>            <to uri="CENTRALMQ:topic:Topic1"/>
>>>  </route>
>>> ...... Few other routes
>>>    </camelContext>
>>>
>>>    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>>>        <property name="connectionFactory">
>>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>>               <property name="brokerURL"
>>> value="vm://LOCALMQ?broker.persistent=false" />
>>>            </bean>
>>>        </property>
>>>    </bean>
>>>    <bean id="CENTRALMQ"
>>> class="org.apache.camel.component.jms.JmsComponent">
>>>        <property name="connectionFactory">
>>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>>               <property name="brokerURL" value="failover://(tcp://
>>> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100"
>>> />
>>>            </bean>
>>>        </property>
>>>    </bean>
>>> -----------------------------------------
>>>
>>> The main change compared with other config I saw is, we are using failover
>>> with two end points that are same, basically with this model we were able
>>> to
>>> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
>>> problems. We need retries but not really failover (i.e, send to secondary
>>> if
>>> primary were down), as messages would still be there in LOCALMQ if there
>>> were some connectivity problems.
>>>
>>> Is there any other way to achieve retries without using "failover
>>> transport"?
>>>
>>> thanks for your time.
>>>
>>> cheers
>>> - mdasari
>>>
>>> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <dejan@nighttale.net>
>>> wrote:
>>>
>>>  Hi,
>>>> did you try 5.2.0 version? Probably some of those issues are already
>>>> addressed.
>>>>
>>>> Cheers
>>>> --
>>>> Dejan Bosanac
>>>>
>>>> Open Source Integration - http://fusesource.com/
>>>> ActiveMQ in Action - http://www.manning.com/snyder/
>>>> Blog - http://www.nighttale.net
>>>>
>>>>
>>>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <mdasari@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
>>>>> servers) after a while the AMQ failover transport stops working thus
>>>>>
>>>> making
>>>>
>>>>> messages to be not delivered. (from a producer AMQ server box to a
>>>>>
>>>> central
>>>>
>>>>> consumer AMQ server box through camel)
>>>>>
>>>>> --------------------------------------------------------------
>>>>> The following is the data from our log files:
>>>>> --------------------------------------------------------------
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
>>>>> - Connection established
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
>>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>>> - Executing callback on JMS Session: ActiveMQSession
>>>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
>>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>>
>>>> ActiveMQTextMessage
>>>>
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>>> - Sending created message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
>>>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
>>>>> ActiveMQTextMessage
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
>>>>> - Stopped.
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
>>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint done.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
>>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>>> MessageDispatch
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
>>>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Attempting connect to: tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
>>>>> CacheEnabled=true, SizePrefixDisabled=false,
>>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - Received WireFormat: WireFormatInfo { version=3,
>>>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
>>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - tcp:///10.87.129.196:61616 before negotiation:
>>>>>
>>>> OpenWireFormat{version=3,
>>>>
>>>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
>>>>> sizePrefixDisabled=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - tcp:///10.87.129.196:61616 after negotiation:
>>>>>
>>>> OpenWireFormat{version=3,
>>>>
>>>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
>>>>> sizePrefixDisabled=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Connection established
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
>>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>>> - Executing callback on JMS Session: ActiveMQSession
>>>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
>>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>>
>>>> ActiveMQTextMessage
>>>>
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>>> - Sending created message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
>>>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
>>>>> ActiveMQTextMessage
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Stopped.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
>>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
>>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>>> MessageDispatch
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
>>>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waiting 10 ms before attempting connection.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
>>>>> Failover Worker: 1889455" java.lang.NullPointerException
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint done.
>>>>> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> ---------------------------------------------
>>>>>
>>>>>
>>>>> Basically, it was able to deliver a message (and few more prior to that
>>>>> time
>>>>> period), but for another message that is very close (in time) to the
>>>>> previous message it is running into a NullPointerException, after that
>>>>> it
>>>>> stops functioning totally.
>>>>>
>>>>> I took a brief look at the FailoverTransport.java code, I'm not an
>>>>> expert
>>>>> on
>>>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
>>>>> member
>>>>> variable is attempted to be used by the task-runner thread before it
was
>>>>> completely initialized  (basically race conditions without proper
>>>>> synchronization)
>>>>>
>>>>> I can provide more details on our network topology if it is required.
I
>>>>> searched around but didn't find any related issues or bugs. Does anyone
>>>>> know
>>>>> if this is a known issue, and which version this is going to be
>>>>>
>>>> addressed.
>>>>
>>>>> If not I'll open a JIRA.
>>>>>
>>>>> Appreciate your help.
>>>>>
>>>>> cheers
>>>>> - mdasari
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>>
>>>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
>>>>
>>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>

-- 

Dipl.-Ing. Norbert Pfistner
Softwareentwicklung

picturesafe GmbH
Simon-von-Utrecht-Straße 31-37
D-20359 Hamburg
http://www.picturesafe.de

fon: +49 40 374127 901
fax: +49 40 374127 999
npfistner@picturesafe.de

Sitz der Gesellschaft: Hannover
Geschäftsführer: Herbert Wirth
HR: Amtsgericht Hannover HR B 53 366

Mime
View raw message