Return-Path: X-Original-To: apmail-activemq-users-archive@www.apache.org Delivered-To: apmail-activemq-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6365178D4 for ; Thu, 23 Oct 2014 09:51:39 +0000 (UTC) Received: (qmail 43749 invoked by uid 500); 23 Oct 2014 09:51:39 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 43700 invoked by uid 500); 23 Oct 2014 09:51:39 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 43685 invoked by uid 99); 23 Oct 2014 09:51:38 -0000 Received: from ec2-54-76-25-247.eu-west-1.compute.amazonaws.com (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Oct 2014 09:51:38 +0000 Received: from mx1-eu-west.apache.org (localhost [127.0.0.1]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id 9893226E40 for ; Thu, 23 Oct 2014 09:51:37 +0000 (UTC) Received: by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org, from userid 113) id 8D97926E48; Thu, 23 Oct 2014 09:51:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mx1-eu-west.apache.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=10.0 tests=SPF_HELO_PASS,URIBL_BLOCKED, URI_HEX,URI_TRY_3LD autolearn=disabled version=3.4.0 Received: from mwork.nabble.com (unknown [162.253.133.43]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id D312526E40 for ; Thu, 23 Oct 2014 09:51:36 +0000 (UTC) Received: from mjoe.nabble.com (unknown [162.253.133.57]) by mwork.nabble.com (Postfix) with ESMTP id 95E0E859FC6 for ; Thu, 23 Oct 2014 02:51:34 -0700 (PDT) Date: Thu, 23 Oct 2014 02:48:49 -0700 (PDT) From: uromahn To: users@activemq.apache.org Message-ID: <1414057729155-4686658.post@n4.nabble.com> In-Reply-To: References: Subject: Re: ActiveMQ Master Slave with ZooKeeper Session Expiration MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP I have experimented with ActiveMQ's replicated LevelDB store for quite some time now and finally came to the conclusion that it is very "flaky" and "brittle" at best. I believe it is not ready for production yet and everybody who considers using it in a hot production environment should probably think twice (or trice). For example, I have a running active/passive cluster with three instances (one active, two passive) connecting to three Zookeeper nodes. I ran several tests against this cluster and then left it sitting over night. The next morning I came back I saw the following exceptions in the log files *without* any activity! On the active node: 2014-10-22 22:31:05,027 | WARN | Unexpected session error: java.io.IOException: Connection reset by peer | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2014-10-22 22:31:05,028 | INFO | Slave has disconnected: d8175d91-6602-4abf-bdfc-2ab07c40997e | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 On node #2 (passive): 2014-10-22 22:31:04,796 | INFO | Client session timed out, have not heard from server in 29584ms for sessionid 0x24937f0a2a70001, closing socket connection and attempting reconnect | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk2:2181) 2014-10-22 22:31:05,098 | INFO | Slave stopped | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[brokergrp1] Task-3 2014-10-22 22:31:05,494 | INFO | Opening socket connection to server uromahn-zk3/10.XX.YY.ZZ:2181. Will not attempt to authenticate using SASL (unknown error) | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,524 | INFO | Socket connection established to uromahn-zk3/10.XX.YY.ZZ:2181, initiating session | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,564 | WARN | Expiration detected: trying to restart... | org.apache.activemq.leveldb.replicated.groups.ZKClient | main-EventThread 2014-10-22 22:31:05,564 | INFO | Initiating client connection, connectString=uromahn-zk1:2181,uromahn-zk2:2181,uromahn-zk3:2181 sessionTimeout=2000 watcher=org.apache.activemq.leveldb.replicated.groups.ZKClient@3c53cec2 | org.apache.zookeeper.ZooKeeper | main-EventThread 2014-10-22 22:31:05,566 | INFO | Unable to reconnect to ZooKeeper service, session 0x24937f0a2a70001 has expired, closing socket connection | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,572 | INFO | Opening socket connection to server uromahn-zk1/10.XX.AA.BB:2181. Will not attempt to authenticate using SASL (unknown error) | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,579 | INFO | EventThread shut down | org.apache.zookeeper.ClientCnxn | main-EventThread 2014-10-22 22:31:05,597 | INFO | Socket connection established to uromahn-zk1/10.XX.AA.BB:2181, initiating session | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,625 | INFO | Session establishment complete on server uromahn-zk1/10.XX.AA.BB:2181, sessionid = 0x14937f0a2840000, negotiated timeout = 4000 | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,745 | WARN | listeners are taking too long to process the events | org.apache.activemq.leveldb.replicated.groups.ChangeListenerSupport | ZooKeeper state change dispatcher thread On node #3 (passive): no entry int he log file for that time. And these are the log entries in my corresponding Zookeeper nodes: Node #1: 2014-10-22 22:31:05,604 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.XX.YY.ZZ:38788 2014-10-22 22:31:05,606 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.XX.YY.ZZ:38788 2014-10-22 22:31:05,612 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14937f0a2840000 with negotiated timeout 4000 for client /10.XX.YY.ZZ:38788 Node #2: 2014-10-22 22:30:40,000 [myid:2] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x24937f0a2a70001, timeout of 4000ms exceeded 2014-10-22 22:30:40,002 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24937f0a2a70001 2014-10-22 22:30:40,009 [myid:2] - INFO [CommitProcessor:2:NIOServerCnxn@1007] - Closed socket connection for client /10.XX.YY.ZZ:52422 which had sessionid 0x24937f0a2a70001 2014-10-22 22:31:05,639 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14937f0a2840000 type:setData cxid:0x1 zxid:0x100000014 txntype:-1 reqpath:n/a Error Path:/activemq-grp1/leveldb-stores/00000000001 Error:KeeperErrorCode = NoNode for /activemq-grp1/leveldb-stores/00000000001 Node #3: 2014-10-22 22:30:39,972 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@118] - Got zxid 0x100000012 expected 0x1 2014-10-22 22:31:05,494 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.XX.YY.ZZ:60802 2014-10-22 22:31:05,497 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x24937f0a2a70001 at /10.XX.YY.ZZ:60802 2014-10-22 22:31:05,497 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x24937f0a2a70001 2014-10-22 22:31:05,516 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@610] - Invalid session 0x24937f0a2a70001 for client /10.XX.YY.ZZ:60802, probably expired 2014-10-22 22:31:05,519 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.XX.YY.ZZ:60802 which had sessionid 0x24937f0a2a70001 One should note and emphasize again: during the time I see those messages in the log there was absolutely no activity on the broker and the Zookeeper nodes (those are exclusively for my brokers)! This simply doesn't give me the confidence of a robust solution - sorry guys! -- View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-Master-Slave-with-ZooKeeper-Session-Expiration-tp4686562p4686658.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.