Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C97FE200C02 for ; Fri, 6 Jan 2017 00:50:18 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C818D160B42; Thu, 5 Jan 2017 23:50:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1F170160B33 for ; Fri, 6 Jan 2017 00:50:17 +0100 (CET) Received: (qmail 97628 invoked by uid 500); 5 Jan 2017 23:50:12 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Delivered-To: moderator for users@activemq.apache.org Received: (qmail 12554 invoked by uid 99); 5 Jan 2017 23:15:45 -0000 X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.738 X-Spam-Level: *** X-Spam-Status: No, score=3.738 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313, URI_TRY_3LD=0.001] autolearn=disabled Date: Thu, 5 Jan 2017 15:15:34 -0800 (PST) From: JordanC To: users@activemq.apache.org Message-ID: <1483658134964-4720859.post@n4.nabble.com> Subject: Calling end on TransactionContext hangs during failover when using master slave MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit archived-at: Thu, 05 Jan 2017 23:50:19 -0000 I have a clustered J2EE application that starts up a broker on each node using failover protocol and a shared data directory. Node 1: Starts up a broker and creates a transport connector at tcp://:61616. Consumer and producer connect via a broker url of failover:(tcp://:61616,tcp://:61617) Node 2: Starts up a broker and creates a transport connector at tcp://:61617. Consumer and producer connect via a broker url of failover:(tcp://:61616,tcp://:61617) The simple use case that is failing for me is the following: 1) Start up node 1 first so it is the master. Start up node 2. 2) Send 4 messages on node 1 with a delay so that the node can be killed before the messages finish processing. 2 messages are being processed on node 1 and 2 on node 2. 3) Forcefully kill node 1 while messages are being processed. The two threads on node 2 that were consuming the messages were both hanging after calling TransactionContext#end. They would go into ResponseCorrelator#request and send a TransactionInfo command. The TransactionInfo command is consumed and creates a response command which is sent correctly. The problem seems to be that this response command is never read in TcpTransport#doRun. Because of this, ResponseCorrelator#request blocks when trying to return the response.getResult(). The transactions for the 2 messages being processed on node 2 block so they are never committed. If I modify my test to only send 2 messages so that each node is processing 1 message, everything runs without any problems. The second node is able to end the transaction successfully by going through the exact same code path except that the response command is consumed. After that it processes the message that was being consumed by node 1 correctly as well. Once I send 4 or more messages, this issue will occur. Does anyone have any insight as to what might be happening? I haven't been able to figure out why the response command doesn't get consumed in the unsuccessful case. There are no exceptions either and the response command seems to be sent successfully. -- View this message in context: http://activemq.2283324.n4.nabble.com/Calling-end-on-TransactionContext-hangs-during-failover-when-using-master-slave-tp4720859.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.