activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Wiegand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQCPP-376) Deadlock in IOTransport when network of brokers restart and failover is used.
Date Fri, 26 Aug 2011 15:28:29 GMT

    [ https://issues.apache.org/jira/browse/AMQCPP-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091823#comment-13091823
] 

Bob Wiegand commented on AMQCPP-376:
------------------------------------

I was eager to try out the updated code since we were also experiencing problems (deadlock)
with 3.4.0 failover.  The stack was different in my cases (attached).

I have built the trunk (r1161214) on Linux and Windows.  A few trivial modifications were
needed to compile on Windows (patch and list of files missing from VS project attached). 

I ran 10-20 failover tests with 10 clients (which are both publishing and consuming) messages.
 In every case but one, all clients failed over.  The exception was one Linux client which
crashed during the failover.  I realize that such a report has minimal value since I have
tons of code running on top of ActiveMQ CPP including JNI, but thought I would at least put
it out there in case you can glean anything from it (Java crash file attached).

> Deadlock in IOTransport when network of brokers restart and failover is used. 
> ------------------------------------------------------------------------------
>
>                 Key: AMQCPP-376
>                 URL: https://issues.apache.org/jira/browse/AMQCPP-376
>             Project: ActiveMQ C++ Client
>          Issue Type: Bug
>          Components: Other C++ Clients
>    Affects Versions: 3.4.0
>         Environment: ActiveMQ-CPP  ver - 3.4.0
> Broker  5.3.1
> Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64
x86_64 GNU/Linux
> gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))
>            Reporter: igor khaustov
>            Assignee: Timothy Bish
>         Attachments: bt_1.txt, bt_2.txt
>
>
> The problem description:
> We  run Network of brokers ( 4 in number ) . 
> Broker URI : broker URI 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'
> Producer loads broker with 1000 message/sec . We testing the producer behavior while
failover by  restarting all brokers in row ( all 4 ) while sending the messages and get deadlock
as shown below .
> Note: the problem tested only with network on brokers .
> The backtrace ( only relevant threads ):
> +Thread 16 (process 26892):+
> *#0  0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0*
> #1  0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
> #2  0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock (handle=0xfefdd38)
at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
> #4  0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
> #5  0x0000000000d51f3f in decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport,
decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) at ./decaf/util/AbstractCollection.h:331
> #6  0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
> #7  0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, intiallyLocked=true)
at decaf/util/concurrent/Lock.cpp:32
> *#8  0x0000000000d47a77 in activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8,
transport=@0x4c7b9cf0) at activemq/transport/failover/CloseTransportsTask.cpp:46*
> #9  0x0000000000b1b748 in activemq::transport::failover::FailoverTransport::handleTransportFailure
(this=0xffed498, error=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransport.cpp:483
> #10 0x0000000000b41a06 in activemq::transport::failover::FailoverTransportListener::onException
(this=0xfde2e58, ex=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransportListener.cpp:76
> #11 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x10627498,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #12 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x10627498,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #13 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0xfeeb558,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #14 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0xfeeb558,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #15 0x0000000000d554c8 in activemq::transport::inactivity::InactivityMonitor::onException
(this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/inactivity/InactivityMonitor.cpp:312
> #16 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x1020c118,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #17 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x1020c118,
ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #18 0x0000000000d326f2 in activemq::transport::IOTransport::fire (this=0xdce10b8, ex=@0x4c7b9ee0)
at activemq/transport/IOTransport.cpp:87
> #19 0x0000000000d32982 in activemq::transport::IOTransport::run (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
> #20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0x105871d8)
at decaf/lang/Thread.cpp:137
> #21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at decaf/lang/Thread.cpp:190
> #22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> +Thread 9 (process 14470):+
> *#0  0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0*
> #1  0x0000000000dc54b3 in decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8)
at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
> #2  0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
> #3  0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at decaf/lang/Thread.cpp:452
> #4  0x0000000000d32c28 in activemq::transport::IOTransport::close (this=0xdce10b8) at
activemq/transport/IOTransport.cpp:222
> #5  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0x1020c118)
at activemq/transport/TransportFilter.cpp:106
> #6  0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close (this=0x1020c118)
at activemq/transport/tcp/TcpTransport.cpp:74
> #7  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0xfeeb558)
at activemq/transport/TransportFilter.cpp:106
> #8  0x0000000000d554ec in activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558)
at activemq/transport/inactivity/InactivityMonitor.cpp:300
> #9  0x0000000000d77867 in activemq::wireformat::openwire::OpenWireFormatNegotiator::close
(this=0x10627498) at activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
> *#10 0x0000000000d478ff in activemq::transport::failover::CloseTransportsTask::iterate
(this=0xff540e8) at activemq/transport/failover/CloseTransportsTask.cpp:75*
> #11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate (this=0xddc0108)
at activemq/threads/CompositeTaskRunner.cpp:173
> #12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run (this=0xddc0108)
at activemq/threads/CompositeTaskRunner.cpp:107
> #13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0xfeeb2b8)
at decaf/lang/Thread.cpp:137
> #14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at decaf/lang/Thread.cpp:190
> #15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> As you can see +Thread 16+ is on lock_wait for *_synchronized( &transports )_* in
activemq::transport::failover::CloseTransportsTask::add .
> The *_synchronized( &transports )_* in locked by +Thread 9+ in activemq::threads::CompositeTaskRunner::iterate
. But  +Thread 9+ is on pthread_cond_wait which has to be signalled by the +Thread 16+.
> Kind regards .
> Igor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message