Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7BAB9177C0 for ; Wed, 12 Nov 2014 02:09:06 +0000 (UTC) Received: (qmail 53363 invoked by uid 500); 12 Nov 2014 02:09:05 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 53315 invoked by uid 500); 12 Nov 2014 02:09:05 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 53304 invoked by uid 99); 12 Nov 2014 02:09:05 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 02:09:05 +0000 Received: from mail-la0-f45.google.com (mail-la0-f45.google.com [209.85.215.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 7755B1A0476 for ; Wed, 12 Nov 2014 02:08:08 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id pn19so10580791lab.32 for ; Tue, 11 Nov 2014 18:08:57 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.112.87.162 with SMTP id az2mr38828564lbb.15.1415758137973; Tue, 11 Nov 2014 18:08:57 -0800 (PST) Reply-To: cammckenzie@apache.org Received: by 10.112.72.170 with HTTP; Tue, 11 Nov 2014 18:08:57 -0800 (PST) Date: Wed, 12 Nov 2014 13:08:57 +1100 Message-ID: Subject: Reconnection with expired session From: Cameron McKenzie To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=001a11347e0ae6e62305079fe0b2 --001a11347e0ae6e62305079fe0b2 Content-Type: text/plain; charset=UTF-8 Guys, I have a (possibly somewhat contrived) issue relating to reconnection of a client to ZK after quorum has been lost, and data has been corrupted. Essentially this is what's happening: -Client connects to 3 node ZK cluster -Client writes some ephemeral zNodes etc. -All nodes in ZK cluster are shut down -Contents of data/version-2 directories are removed on each ZK instance (i.e. the acceptedEpoch, currentEpoch and all the snapshots and tran logs) -Restart the nodes in the ZK cluster At this point, the ZK cluster comes up fine, but the client will not automatically reconnect. Having stepped through the client code with a debugger it seems like the server just doesn't respond to the session initialisation request). These are the logs, which are repeated every second. Note that if I restart the client, everything's fine. 12:56:35.978 [main-SendThread(ubuntubox:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ubuntubox/192.168.56.102:2181. Will not attempt to authenticate using SASL (unknown error) 12:56:35.980 [main-SendThread(ubuntubox:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to ubuntubox/192.168.56.102:2181, initiating session 12:56:35.983 [main-SendThread(ubuntubox:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on ubuntubox/192.168.56.102:2181 12:56:36.002 [main-SendThread(ubuntubox:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x249a1b64cc90000, likely server has closed socket, closing socket connection and attempting reconnect 12:56:37.833 [main-SendThread(ubuntubox:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ubuntubox/192.168.56.102:2182. Will not attempt to authenticate using SASL (unknown error) 12:56:37.834 [main-SendThread(ubuntubox:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to ubuntubox/192.168.56.102:2182, initiating session 12:56:37.835 [main-SendThread(ubuntubox:2182)] DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on ubuntubox/192.168.56.102:2182 12:56:37.859 [main-SendThread(ubuntubox:2182)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x249a1b64cc90000, likely server has closed socket, closing socket connection and attempting reconnect 12:56:38.298 [main-SendThread(ubuntubox:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ubuntubox/192.168.56.102:2183. Will not attempt to authenticate using SASL (unknown error) 12:56:38.299 [main-SendThread(ubuntubox:2183)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to ubuntubox/192.168.56.102:2183, initiating session 12:56:38.300 [main-SendThread(ubuntubox:2183)] DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on ubuntubox/192.168.56.102:2183 Can someone explain what's going on? Is this a bug? While I understand that it's slightly contrived, the destruction of the data is certainly a possibility, and having to restart every client even when the cluster comes back up is not ideal. cheers Cam --001a11347e0ae6e62305079fe0b2--