hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gunnar Tapper <tapper.gun...@gmail.com>
Subject Splitting causes HBase to crash
Date Fri, 13 May 2016 07:17:18 GMT
Hi,

I'm doing some development testing with Apache Trafodion running
HBase Version 1.0.0-cdh5.4.5.

All of a sudden, HBase has started to crash. First, it could not be
recovered until I changed hbase_master_distributed_log_splitting to false.
At that point, HBase restarted and sat happily idling for 1 hour. Then, I
started Trafodion letting it sit idling for 1 hour.

I then started a workload and all RegionServers came crashing down. Looking
at the log files, I suspected ZooKeeper issues so I restarted ZooKeeper and
then HBase. Now, the HMaster fails with:

2016-05-13 07:13:52,521 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition {a33adb83f77095913adb4701b01c09a0 state=PENDING_OPEN,
ts=1463123333157, server=ip-172-31-50-109.ec2.internal,60020,1463122925684}
to {a33adb83f77095913adb4701b01c09a0 state=OPENING, ts=1463123632517,
server=ip-172-31-50-109.ec2.internal,60020,1463122925684}
2016-05-13 07:13:52,527 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x354a8eaea3e007d,
quorum=ip-172-31-53-252.ec2.internal:2181,ip-172-31-54-241.ec2.internal:2181,ip-172-31-61-36.ec2.internal:2181,
baseZNode=/hbase Unable to list children of znode
/hbase/region-in-transition
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1466)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:296)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:518)
at
org.apache.hadoop.hbase.master.AssignmentManager$5.run(AssignmentManager.java:1420)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-05-13 07:13:52,527 INFO
org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager:
stop: server shutting down.
2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer:
Stopping server on 60000
2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer:
RpcServer.listener,port=60000: stopping
2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer:
RpcServer.responder: stopped
2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer:
RpcServer.responder: stopping
2016-05-13 07:13:52,532 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask@33d4a2bd rejected from
java.util.concurrent.ThreadPoolExecutor@4d0840e0[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 38681]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at
org.apache.hadoop.hbase.master.AssignmentManager.zkEventWorkersSubmit(AssignmentManager.java:1285)
at
org.apache.hadoop.hbase.master.AssignmentManager.handleAssignmentEvent(AssignmentManager.java:1479)
at
org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:1244)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:458)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-05-13 07:13:52,533 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
/hbase/rs/ip-172-31-50-109.ec2.internal,60000,1463122925543 already
deleted, retry=false
2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ZooKeeper: Session:
0x354a8eaea3e007d closed
2016-05-13 07:13:52,534 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
ip-172-31-50-109.ec2.internal,60000,1463122925543; zookeeper connection
closed.
2016-05-13 07:13:52,534 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
master/ip-172-31-50-109.ec2.internal/172.31.50.109:60000 exiting
2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down

Suggestions on how to move forward so that I can recover this system?

-- 
Thanks,

Gunnar
*If you think you can you can, if you think you can't you're right.*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message