Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7A35C71A for ; Sat, 26 May 2012 09:29:09 +0000 (UTC) Received: (qmail 69167 invoked by uid 500); 26 May 2012 09:29:08 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 68845 invoked by uid 500); 26 May 2012 09:29:06 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 68809 invoked by uid 99); 26 May 2012 09:29:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 09:29:04 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of syrious3000@hotmail.de designates 65.55.90.95 as permitted sender) Received: from [65.55.90.95] (HELO snt0-omc2-s20.snt0.hotmail.com) (65.55.90.95) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 09:28:56 +0000 Received: from SNT126-W57 ([65.55.90.73]) by snt0-omc2-s20.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 26 May 2012 02:28:35 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_201b319b-329a-4499-93ed-0ca72fe598a7_" X-Originating-IP: [84.183.231.8] From: =?iso-8859-1?B?Q2hyaXN0aWFuIFNjaORmZXI=?= To: Subject: Re: HBase dies after some time Date: Sat, 26 May 2012 11:28:34 +0200 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 26 May 2012 09:28:35.0443 (UTC) FILETIME=[ECAE7C30:01CD3B21] X-Virus-Checked: Checked by ClamAV on apache.org --_201b319b-329a-4499-93ed-0ca72fe598a7_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C I got exactly the same behaviour and exceptions that you mention on a loca= l cluster. In my case the sum of all services' heapspace was higher than the actual me= mory of the machine. At first sum the heapspaces of your master machine likely running=20 NameNode=2C HMaster=2C ZooKeeper=2C and maybe also=2C RegionServer and Data= Node Then check that this sum is lesser than your master machines memory. Good Luck. Chris Von: Something Something An: hbase-user@hadoop.apache.org=3B zookeeper-user@hadoop.apache.org=20 Gesendet: 3:22 Samstag=2C 26.Mai 2012 Betreff: HBase dies after some time =20 Hello=2C I recently installed ZooKeeper & HBase on our dedicated Hadoop cluster on EC2. The HBase stays active for some time=2C but after a while it dies wit= h error messages similar to these: 2012-05-25 12:09:27=2C514 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x5378489312c0004-0x5378489312c0004 Received unexpected KeeperException=2C re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =3D ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) =20 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager= .java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) 2012-05-25 12:09:27=2C514 ERROR org.apache.hadoop.hbase.master.ActiveMasterManager: master:60000-0x5378489312c0004-0x5378489312c0004 Error deleting our own master address node org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =3D ConnectionLoss for /hbase/master =20 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager= .java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) This kills the HMaster as well as all HRegionServers. Could it be that my ZooKeeper setup is incorrect? Please help. Thanks. = --_201b319b-329a-4499-93ed-0ca72fe598a7_--