Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFAEA905B for ; Sat, 26 May 2012 01:23:05 +0000 (UTC) Received: (qmail 85240 invoked by uid 500); 26 May 2012 01:23:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 85101 invoked by uid 500); 26 May 2012 01:23:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 84858 invoked by uid 500); 26 May 2012 01:23:03 -0000 Delivered-To: apmail-hadoop-hbase-user@hadoop.apache.org Received: (qmail 84839 invoked by uid 99); 26 May 2012 01:23:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 01:23:03 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mailinglists19@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 01:22:55 +0000 Received: by wibhn14 with SMTP id hn14so42806wib.11 for ; Fri, 25 May 2012 18:22:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=IoHQLP31Xnatkvxh3PtDfUm6/goHEj+SiT4BaBC6evk=; b=hUbsAzqfB0YU6TEP3jIX96u7irXSb2j0Rx4oh/Ur6cwxLeJ4+FFdErp/aKWZLkeiWj UKcUBjmslJxw/20HiWWkihEAQkIKdS/wya8h5p8yo4nM2sZQTiVq1uJsdHPc95CI4Ggh XnjXIUNQ8HL14lHaptt2txSsSMsKISkdzEZBpMTeU1+cL2ttYYeXZSS7iVQXEWvgSOLR oIXrPKzIjTdeDQt66jDW8bnAZ6su+SpC8+Te2i+hGSVVQgkyTPBVxNtt3fjgYxNLp/GY oGFVQel7VhkTNx0Eb/kMgFaNz+CFcTupcT/a+BELd3JcjzE8d6puLtQN+ChAp0ielMRF 3tKg== MIME-Version: 1.0 Received: by 10.216.140.2 with SMTP id d2mr495340wej.0.1337995355037; Fri, 25 May 2012 18:22:35 -0700 (PDT) Received: by 10.223.1.208 with HTTP; Fri, 25 May 2012 18:22:34 -0700 (PDT) Date: Fri, 25 May 2012 18:22:34 -0700 Message-ID: Subject: HBase dies after some time From: Something Something To: hbase-user@hadoop.apache.org, zookeeper-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6de007fd91f4e04c0e6511e --0016e6de007fd91f4e04c0e6511e Content-Type: text/plain; charset=ISO-8859-1 Hello, I recently installed ZooKeeper & HBase on our dedicated Hadoop cluster on EC2. The HBase stays active for some time, but after a while it dies with error messages similar to these: 2012-05-25 12:09:27,514 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x5378489312c0004-0x5378489312c0004 Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) 2012-05-25 12:09:27,514 ERROR org.apache.hadoop.hbase.master.ActiveMasterManager: master:60000-0x5378489312c0004-0x5378489312c0004 Error deleting our own master address node org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) This kills the HMaster as well as all HRegionServers. Could it be that my ZooKeeper setup is incorrect? Please help. Thanks. --0016e6de007fd91f4e04c0e6511e--