hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Singh <a...@urbanairship.com>
Subject Re: Distributed log splitting failing after cluster outage.
Date Thu, 06 Mar 2014 19:32:35 GMT
We ran into this a few weeks ago when while adding new nodes into an
existing cluster. Due to a misconfiguration, the new nodes were assigned a
wrong zookeeper quorum, and ended up forming a new cluster.
We saw a similar error in our logs:

2014-01-30 16:47:19,196 ERROR
org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
processing event M_META_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for
xxxxx.xxx.urbanairship.com,60020,1385165871751, will retry
	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:182)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: error or interrupted while splitting
logs in [maprfs:/......./xxxx.xxxx.urbanairship.com,60020,1385165871751-splitting]
Task = installed = 1 done = 0 error = 1
	at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:272)
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:284)
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:252)
	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:175)


We fixed it by shutting the new nodes down, moving aside the offending logs
and restarting the master. Later,we fixed the zooker configuration and then
brought new nodes back into the cluster.

Alok


On Thu, Mar 6, 2014 at 11:13 AM, David Koch <ogdude@googlemail.com> wrote:

> Hello,
>
> Our HBase cluster had an unexpected shut-down and while trying to bring it
> back up we the Master gets stuck with the following message:
>
> Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> java.io.IOException: error or interrupted while splitting logs in [ list of
> <host_name>,<port>,<tmst> ]
> Task = installed = 10 done = 0 error = 10
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> at java.lang.Thread.run(Thread.java:724)
>
> What can I do to get the cluster operational again. There was no data
> ingestion going on since quite some hours before the crash so maybe
> clearing out /hbase/.logs/ could be an option.
>
> Thanks,
>
> /David
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message