python (JIRA)
[jira] Commented: (AMQCPP-165) Core Dump on reconnect/open queue
Wed, 18 Jun 2008 16:03:00 GMT


python commented on AMQCPP-165:

I have produced (and reproduced) a similar error using activemq-cpp-2.1.3 on WindowsXP/WindowsServer2003

ActiveMQ Broker 5.1

Backtrace (VS2005):
* resource=0x029f0678)  Line 1195 + 0x25 bytes C++
activemq::connector::BaseConnectorResource::close()  Line 64     C++
activemq::connector::openwire::OpenWireSessionInfo::~OpenWireSessionInfo()  Line 57    C++
activemq::connector::openwire::OpenWireSessionInfo::`scalar deleting destructor'()  + 0xf
bytes    C++
ackMode=AUTO_ACKNOWLEDGE)  Line 282 + 0x32 bytes        C++
activemq::core::ActiveMQConnection::createSession(cms::Session::AcknowledgeMode ackMode=AUTO_ACKNOWLEDGE)
 Line 98 + 0x8b bytes          C++

null pointer exception occurs on this line:
            dataStructure = session->getSessionInfo()->getSessionId(); 
The session object is fine, but the getSessionInfo() call returns NULL.		

Steps to reproduce:
1-Three activemq-cpp clients (within the same process) connect to a broker.
2-Three queues are used to send many messages per second to the broker.
3-While the connection is active, the broker runs out of disk space and then memory.
4-Continuous attempts to reconnect to the broker fail, and eventually may produce the above
error (could take several hours to produce depending on frequency of reconnect attempts).

Since producing this error can be difficult, it's easier to just look at the code:

1: In OpenWireConnector::createSession(): syncRequest(info) throws an exception. Note that:
session->setSessionInfo( info ); is never called. 
2: Exception handler calls: delete session;
3: OpenWireSessionInfo object's destructor is called which calls the BaseConnectorResource::close()
4: Then connector->closeResource( this ); is called (OpenWireConnector::closeResource())
which tries to access the the resource's sessionInfo. Since the sessionInfo has not been set
yet, we have a crash.

We fixed this by returning immediately from closeResource() if getSessionInfo() returns NULL.
Perhaps this can be fixed by updating the OpenWireConnector::state instead. Not too sure...

Also, by looking at the code it doesn't look like it's fixed in 2.2. I have not tested it

> our activemq application core dumped several times during the last days when the connection
to the broker was lost. each time it was either caused by the broker beeing restartet or write
attempts failing (see exception below).
> in both cases the application catches a CMS exception, closes all queues and tries to
re-open them after 60s. all activemq objects are destroyed after closing (see cleanup() from
web example).
> the core dumps seemed to happen when the application trys to re-open the connection,
but fails because the broker is still unreachable. here is the backtrace:
> <quote>
> #0  activemq::connector::openwire::OpenWireConnector::closeResource (this=0x8b4a268,
resource=0x8b4dde0) at activemq/connector/openwire/OpenWireConnector.cpp:1200
> #1  0x080da6fc in activemq::connector::BaseConnectorResource::close (this=0x8b4dde0)
at activemq/connector/BaseConnectorResource.cpp:59
> #2  0x0812ff50 in ~OpenWireSessionInfo (this=0x8b4dde0) at OpenWireSessionInfo.h:56
> #3  0x0812d0c4 in activemq::connector::openwire::OpenWireConnector::createSession (this=0x8b4dde0,
>     at activemq/connector/openwire/OpenWireConnector.cpp:281
> #4  0x080e86c1 in activemq::core::ActiveMQConnection::createSession (this=0x8b4ded0,
ackMode=137247624) at activemq/core/ActiveMQConnection.cpp:98
> #5  0x08059c19 in ActiveMqQueue::open (this=0x8b1d6b0, aQueueName=0x8ab925c "outqueue",
aMode=ActiveMqQueue::modeWrite, aListenMode=0) at
> </quote>
> Debuggin shows that at activemq/connector/openwire/OpenWireConnector.cpp:1200
> 1200:  dataStructure = session->getSessionInfo()->getSessionId();
> the session object is null, the previously dyn-casted resource object however is not
> <quote>
> (gdb) p session
> $1 = (activemq::connector::openwire::OpenWireSessionInfo *) 0x0
> (gdb) p resource
> $2 = (class activemq::connector::ConnectorResource *) 0x8b4dde0</quote>
> (corrupt memory?)
> Exception when write attempts fail:
> <quote>No valid response received for command: Begin Class = ActiveMQTextMessage
Begin Class = ActiveMQMessageBase  Value of ackHandler = 0  Value of redeliveryCount = 0 
Value of properties = Begin Class PrimitiveMap: Begin Class PrimitiveMap:  Begin Class = Message
 Value of Message::ID_MESSAGE = 0  Value of ProducerId is Below: Begin Class = ProducerId
 Value of ProducerId::ID_PRODUCERID = 123  Value of ConnectionId = 0c00f32b-2269-4e0f-ace1-13fd0414b4b5
 Value of Value = 0  Value of SessionId = 0 No Data for Class BaseDataStructure End Class
= ProducerId   Value of Destination is Below: Begin Class = ActiveMQQueue Begin Class = ActiveMQDestination
 Value of exclusive = false  Value of ordered = false  Value of advisory = false  Value of
orderedTarget = coordinator  Value of physicalName = ffs_out  Value of options = Begin Class
activemq::util::Properties: End Class activemq::util::Properties:  No Data for Class BaseDataStructure
End Class = ActiveMQDestination End Class = ActiveMQQueue   Value of TransactionId is Below:
   Object is NULL  Value of OriginalDestination is Below:    Object is NULL  Value of MessageId
is Below: Begin Class = MessageId  Value of MessageId::ID_MESSAGEID = 110  Value of ProducerId
is Below: Begin Class = ProducerId  Value of ProducerId::ID_PRODUCERID = 123  Value of ConnectionId
= 0c00f32b-2269-4e0f-ace1-13fd0414b4b5  Value of Value = 0  Value of SessionId = 0 No Data
for Class BaseDataStructure End Class = ProducerId   Value of ProducerSequenceId = 4  Value
of BrokerSequenceId = 0 No Data for Class BaseDataStructure End Class = MessageId   Value
of OriginalTransactionId is Below:    Object is NULL  Value of GroupID =   Value of GroupSequence
= 0  Value of CorrelationId =   Value of Persistent = 1  Value of Expiration = 1201683817204
 Value of Priority = 4  Value of ReplyTo is Below:    Object is NULL  Value of Timestamp =
1201676617204  Value of Type =   Value of Content[0] = , check broker.</quote>
> Versions:
> Activemq-cpp-2.1.1
> ActiveMq Broker 4.1.1
> the application handles 17 write-mode queues, with a rather low messages/second rate.
> Using 5.0.0 broker instead of 4.1.1 would most likely solve this problem, since the failed
write attempts problem only occurs with 4.1.1 broker (i reported this bug before, but it seemed
like no one was interested in taking care of it). however, the broker 5.0.0 won't start with
preconfigured JAAS queues, so its not an option and we have to stick with 4.1.1. i will try
the latest snapshot these days, however i dont feel good when using a snapshot server in production

