hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file system not available
Date Wed, 07 Oct 2009 16:51:54 GMT
HMaster will shut itself down if it loses its zookeeper lease by design.
Can you try to figure why this happened?  Was there something running on the
HMaster that stole all i/o or put the machine into swap (Its unlikely that
the HMaster itself was the culprit since it does near to nought).  You could
try upping the zookeeper session or you could run more HMaster instances so
that if this happens again, then there'll be an instance to fail over to.

Yours,
St.Ack



On Wed, Oct 7, 2009 at 9:43 AM, Lucas Nazário dos Santos <
nazario.lucas@gmail.com> wrote:

> Hello,
>
> My HBase cluster crashed today after a couple of days running and the logs
> show the exception bellow (end of the message).
>
> Some log excerpts that took my attention are:
>
> 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster:
> Master
> lost its znode, killing itself now
> 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
> Shutting down HBase cluster: file system not available
>
> Any clue on what happened? What could I do to prevent this from occurring
> in
> the future?
>
> Thanks!
> Lucas
>
>
>
> 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 9 row(s) of meta region {server:
> 192.168.1.3:60020, regionname: .META.,,1, startKey: <>} complete
> 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner:
> All
> 1 .META. region(s) scanned
> 2009-10-07 11:25:06,311 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x1242b188e8a0001 to sun.nio.ch.SelectionKeyImpl@148c02f
> java.io.IOException: TIMED OUT
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server server2/192.168.1.3:2181
> 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected local=/
> 192.168.1.3:49602 remote=server2/192.168.1.3:2181]
> 2009-10-07 11:25:06,703 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2009-10-07 11:25:16,911 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x242b1890c70000 to sun.nio.ch.SelectionKeyImpl@1060478
> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
> lim=4 cap=4]
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> 2009-10-07 11:25:16,911 INFO org.apache.hadoop.hbase.master.ServerManager:
> server2,60020,1254853514050 znode expired
> 2009-10-07 11:25:17,021 INFO org.apache.hadoop.hbase.master.RegionManager:
> META region removed from onlineMetaRegions
> 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster:
> Master
> lost its znode, killing itself now
> 2009-10-07 11:25:17,032 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server server2,60020,1254853514050: logSplit: false, rootRescanned: false,
> numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
> 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
> Shutting down HBase cluster: file system not available
> java.io.IOException: File system is not available
>        at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125)
>        at
> org.apache.hadoop.hbase.master.HMaster.checkFileSystem(HMaster.java:324)
>        at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:525)
>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)
> Caused by: java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:197)
>        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:585)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
>        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
>        at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114)
>        ... 3 more
> 2009-10-07 11:25:17,174 INFO org.apache.hadoop.hbase.master.HMaster:
> Stopping infoServer
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message