hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Gallotta <ma...@gallotta.co.za>
Subject Hbase master failing to start after reaching 95% disk and expanding cluster
Date Sat, 29 Dec 2012 11:44:57 GMT
Hi there 

I've been running an hbase cluster for several months, and it recently experienced problems
as the nodes reached 95% disk capacity. I added an extra node, and now the master keeps crashing
with the errors below. I also increased the disk capacity on each individual node after this,
and the errors are the same. I tried removing the new node, and that doesn't help.

There are similar errors in the regionserver and zookeeper logs, but the all seem to echo
from the master logs.

Anything I can look at to help diagnose what the problem here is?

hbase-root-master-analytics.log:
Sat Dec 29 03:14:22 PST 2012 Starting master on analytics
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59480
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 59480
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
2012-12-29 03:14:24,601 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,614 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,622 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,631 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,636 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,643 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,651 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,665 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,675 INFO org.apache.hadoop.ipc.HBaseServer: Starting Thread-2
2012-12-29 03:14:24,698 INFO org.apache.hadoop.ipc.HBaseServer: Starting IPC Server listener
on 60000
2012-12-29 03:14:25,322 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
2012-12-29 03:14:28,735 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase

2012-12-29 03:14:32,797 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
2012-12-29 03:14:41,427 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
2012-12-29 03:14:41,427 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper
exists failed after 3 retries
2012-12-29 03:14:41,428 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to
start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster

        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1740)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:146)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1754)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:176)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:896)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:154)
        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:281)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1735)
        ... 5 more


-- 
Marco Gallotta | Mountain View, California
Software Engineer, Infrastructure | Loki Studios
fb.me/marco.gallotta | twitter.com/marcog
marco@gallotta.co.za | +1 (650) 417-3313

Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message