[ https://issues.apache.org/activemq/browse/AMQCPP-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gali Shvidky reopened AMQCPP-184:
---------------------------------
Regression: [Regression]
We experience the issue with
ActiveMQ-cpp-2.1.3 (Linux )
ActiveMQ Broker 5.1 (Linux)
It looks like there is a race condition which causes our application to crash. Following is
the core found (only relevant thread traces provided):
Thread 12 (process 28639):
#0 0x003807a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x008fe0fd in pthread_join () from /lib/tls/libpthread.so.0
#2 0x080bd401 in activemq::concurrent::Thread::join (this=0xb775dcc0) at activemq/concurrent/Thread.cpp:102
#3 0x08096599 in activemq::transport::IOTransport::close (this=0xb775bde0) at activemq/transport/IOTransport.cpp:142
#4 0x080e76a3 in activemq::transport::filters::TcpTransport::close (this=0xb77c9938) at ./activemq/transport/TransportFilter.h:205
#5 0x080e26d9 in activemq::transport::filters::ResponseCorrelator::close (this=0xb77c9e90)
at activemq/transport/filters/ResponseCorrelator.cpp:238
#6 0x080e3659 in ~ResponseCorrelator (this=0xb77c9e90) at activemq/transport/filters/ResponseCorrelator.cpp:60
#7 0x0806ab00 in activemq::core::ActiveMQConnectionFactory::createConnection (url=@0xb775eaec,
username=@0xb775eae4, password=@0xb775eae8, clientId=@0x45fab90) at activemq/core/ActiveMQConnectionFactory.cpp:177
#8 0x0806b166 in activemq::core::ActiveMQConnectionFactory::createConnection (this=0xb775eae0)
at activemq/core/ActiveMQConnectionFactory.cpp:66
Thread 1 (process 32611):
#0 0x003807a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x02a787a5 in raise () from /lib/tls/libc.so.6
#2 0x02a7a361 in abort () from /lib/tls/libc.so.6
#3 0x0085a0bc in ut_onsig_call_now () from /opt/CSCOacs/db/dbsrv/lib32/libdbtasks10_r.so
#4 <signal handler called>
#5 0x003807a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#6 0x02a787a5 in raise () from /lib/tls/libc.so.6
#7 0x02a7a209 in abort () from /lib/tls/libc.so.6
#8 0x01a2514b in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#9 0x01a22e61 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#10 0x01a22e96 in std::terminate () from /usr/lib/libstdc++.so.6
#11 0x01a23545 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#12 0x08095ac5 in activemq::transport::TransportFilter::onTransportException (this=0xb77c9e90,
source=0xb77c9938, ex=@0x9ece188) at activemq/transport/TransportFilter.h:74
#13 0x08095ac5 in activemq::transport::TransportFilter::onTransportException (this=0xb77c9938,
source=0xb775bde0, ex=@0x9ece188) at activemq/transport/TransportFilter.h:74
#14 0x0809713c in activemq::transport::IOTransport::run (this=0xb775bde0) at activemq/transport/IOTransport.h:106
#15 0x080bd51d in activemq::concurrent::Thread::runCallback (param=0xb775dcc0) at activemq/concurrent/Thread.cpp:152
#16 0x008fd371 in start_thread () from /lib/tls/libpthread.so.0
#17 0x02b18ffe in clone () from /lib/tls/libc.so.6
The thread 12 is joining thread 1 while destroying the ResponseCorrelator object, while thread
1 is calling the same ResponseCorrelator object's exception callback.
(gdb) thread 1
[Switching to thread 1 (process 32611)]#0 0x003807a2 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
(gdb) f 12
#12 0x08095ac5 in activemq::transport::TransportFilter::onTransportException (
this=0xb77c9e90, source=0xb77c9938, ex=@0x9ece188)
at activemq/transport/TransportFilter.h:74
74 activemq/transport/TransportFilter.h: No such file or directory.
in activemq/transport/TransportFilter.h
(gdb) set print object on
(gdb) p this
$3 = (activemq::transport::filters::ResponseCorrelator *) 0xb77c9e90
This is the same object which is being destructed in the thread 12:
#6 0x080e3659 in ~ResponseCorrelator (this=0xb77c9e90) at activemq/transport/filters/ResponseCorrelator.cpp:60
The crashed thread and the joined thread are the same:
thread 12: #2 0x080bd401 in activemq::concurrent::Thread::join (this=0xb775dcc0) at activemq/concurrent/Thread.cpp:102
thread 1: #15 0x080bd51d in activemq::concurrent::Thread::runCallback (param=0xb775dcc0) at
activemq/concurrent/Thread.cpp:152
Let's get a closer look at thread 12, the function ActiveMQConnectionFactory::createConnection()
and try to find out why it destroys that ResponseCorrelator:
#7 0x0806ab00 in activemq::core::ActiveMQConnectionFactory::createConnection (url=@0xb775eaec,
username=@0xb775eae4, password=@0xb775eae8, clientId=@0x45fab90) at activemq/core/ActiveMQConnectionFactory.cpp:177
....
// Create and Return the new connection object.
connection = new ActiveMQConnection( connectionData );
return connection;
} catch( exceptions::ActiveMQException& ex ) {
ex.setMark( __FILE__, __LINE__ );
delete connection;
delete connector;
delete transport; // <<<<< this is the line 177 >>>>>>>
delete properties;
throw ex;
.....
So it caught an exception and cleans the allocated objects now.
(gdb) f 7
#7 0x0806ab00 in activemq::core::ActiveMQConnectionFactory::createConnection (
url=@0xb775eaec, username=@0xb775eae4, password=@0xb775eae8,
clientId=@0x45fab90) at activemq/core/ActiveMQConnectionFactory.cpp:177
177 activemq/core/ActiveMQConnectionFactory.cpp: No such file or directory.
in activemq/core/ActiveMQConnectionFactory.cpp
(gdb) p ex
$1 = (class activemq::exceptions::ActiveMQException
&) @0xb77c9898: {<cms::CMSException> = {<> = {<No data fields>},
<No data fields>}, message = {static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>>
= {<No data fields>}, <No data fields>},
_M_p = 0xb775e134 "activemq::io::SocketOutputStream::write - Connection reset by peer"}},
Yet the object is destroyed before the connection thread is stopped. I think that the connection
thread should be stopped prior cleanup, something like:
} catch( exceptions::ActiveMQException& ex ) {
ex.setMark( __FILE__, __LINE__ );
// ??? stop the thread
if (transport)
{
transport->close();
}
delete connection;
delete connector;
delete transport; // <<<< this is the line 177>>>>>>
delete properties;
> TransportFilter::fire() crashes after accessing a dangling pointer during exception in
ActiveMQConnectionFactory::createConnection()
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMQCPP-184
> URL: https://issues.apache.org/activemq/browse/AMQCPP-184
> Project: ActiveMQ C++ Client
> Issue Type: Bug
> Affects Versions: 2.1.3
> Environment: Windows XP/Server 2003
> Reporter: python
> Assignee: Timothy Bish
> Fix For: 2.2.1
>
>
> This problems was seen on:
> Versions:
> ActiveMQ-cpp-2.1.3 (WindowsServer2003/XP)
> ActiveMQ Broker 5.1 (WindowsServer2003)
> This looks similar to issue [AMQCPP-122|https://issues.apache.org/activemq/browse/AMQCPP-122],
which was fixed in 2.1, but I don't see how IOTransport::run() and error handling have been
properly synchronized.
> Steps to reproduce:
> # Continuously try to reconnect to an activemq broker that has run out of memory.
> # This may eventually produce the crash (could take several hours to produce depending
on frequency of reconnect attempts).
> # Running activemq-cpp through purify can help reproduce this problem more easily.
> # A "R6025 pure virtual function call" error message may be printed out to the console
when this error happens.
> Backtraces:
> Thread 1:
> {noformat}
> activemq::transport::TransportFilter::fire() + 0x48 bytes
> activemq::transport::TransportFilter::fire() + 0x48 bytes
> activemq::transport::IOTransport::fire() + 0x48 bytes
> activemq::transport::IOTransport::run() + 0x7f bytes
> activemq::concurrent::Thread::runCallback() + 0x45 bytes
> msvcr80.dll!781329bb()
> {noformat}
> The crash happens on this line:
> exceptionListener->onTransportException( this, ex );
> Thread 2:
> {noformat}
> activemq::concurrent::Thread::join() Line 108 C++
> activemq::transport::IOTransport::close() Line 143 C++
> activemq::transport::TransportFilter::close() Line 213 C++
> activemq::transport::filters::TcpTransport::close() Line 143 + 0xb bytes C++
> activemq::transport::filters::ResponseCorrelator::close() Line 241 C++
> activemq::transport::filters::ResponseCorrelator::~ResponseCorrelator() Line 64 C++
> activemq::transport::filters::ResponseCorrelator::`scalar deleting destructor'() + 0xf
bytes C++
> activemq::core::ActiveMQConnectionFactory::createConnection(const std::basic_string<char,std::char_traits<char>,std::allocator<char>
> activemq::core::ActiveMQConnectionFactory::createConnection() Line 66 + 0x3a bytes
C++
> {noformat}
> During ActiveMQConnectionFactory::createConnection() an exception is thrown and the transport
object is deleted. Unfortunately,
> while being deleted this object is still being used by Thread#1 (IOTransport::run).
> I greatly reduced the likelihood of this problem by calling setTransportExceptionListener(NULL)
in TransportFilter's destructor.
> After doing that, another crash will start to appear (under the same test conditions)
with the following backtrace:
> Thread 1:
> {noformat}
> activemq::connector::openwire::OpenWireCommandReader::readCommand() Line 71 + 0x1e bytes
C++
> activemq::transport::IOTransport::run() Line 166 + 0x19 bytes C++
> activemq::concurrent::Thread::runCallback(void * param=0x02a750b0) Line 152 + 0x13 bytes
C++
> msvcr80d.dll!_callthreadstartex() Line 348 + 0xf bytes C
> msvcr80d.dll!_threadstartex(void * ptd=0x02a6b8c0) Line 331 C
> kernel32.dll!7c80b683()
> ntdll.dll!7c91b686()
> {noformat}
> The crash happens on this line:
> return openWireFormat->unmarshal( dataInputStream );
> Thread 2:
> {noformat}
> activemq::concurrent::Thread::join() Line 108 C++
> activemq::transport::IOTransport::close() Line 143 C++
> activemq::transport::TransportFilter::close() Line 213 C++
> activemq::transport::filters::TcpTransport::close() Line 143 + 0xb bytes C++
> activemq::transport::filters::ResponseCorrelator::close() Line 241 C++
> activemq::transport::filters::ResponseCorrelator::~ResponseCorrelator() Line 64 C++
> activemq::transport::filters::ResponseCorrelator::`scalar deleting destructor'() + 0xf
bytes C++
> activemq::core::ActiveMQConnectionFactory::createConnection(const std::basic_string<char,std::char_traits<char>,std::allocator<char>
> activemq::core::ActiveMQConnectionFactory::createConnection() Line 66 + 0x3a bytes
C++
> {noformat}
> This second problem is similar to the first and seems to be caused when the OpenWireConnector
is deleted before IOTransport::close() is called. Since IOTransport::run() tries to use the
OpenWireConnector (via OpenWireCommandReader::readCommand()), a crash can occur.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|