hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file system not available
Date Wed, 07 Oct 2009 16:58:38 GMT
Looks like your DFS NameNode became unavailable about the same time that ZooKeeper timeouts
started happening. Overloading? Anything relevant in the NameNode logs?

   - Andy




________________________________
From: Lucas Nazário dos Santos <nazario.lucas@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wed, October 7, 2009 9:43:49 AM
Subject: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file  system not available

Hello,

My HBase cluster crashed today after a couple of days running and the logs
show the exception bellow (end of the message).

Some log excerpts that took my attention are:

2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster: Master
lost its znode, killing itself now
2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
Shutting down HBase cluster: file system not available

Any clue on what happened? What could I do to prevent this from occurring in
the future?

Thanks!
Lucas



2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 9 row(s) of meta region {server:
192.168.1.3:60020, regionname: .META.,,1, startKey: <>} complete
2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-10-07 11:25:06,311 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x1242b188e8a0001 to sun.nio.ch.SelectionKeyImpl@148c02f
java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server server2/192.168.1.3:2181
2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
192.168.1.3:49602 remote=server2/192.168.1.3:2181]
2009-10-07 11:25:06,703 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2009-10-07 11:25:16,911 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x242b1890c70000 to sun.nio.ch.SelectionKeyImpl@1060478
java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
lim=4 cap=4]
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
2009-10-07 11:25:16,911 INFO org.apache.hadoop.hbase.master.ServerManager:
server2,60020,1254853514050 znode expired
2009-10-07 11:25:17,021 INFO org.apache.hadoop.hbase.master.RegionManager:
META region removed from onlineMetaRegions
2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster: Master
lost its znode, killing itself now
2009-10-07 11:25:17,032 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server server2,60020,1254853514050: logSplit: false, rootRescanned: false,
numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
Shutting down HBase cluster: file system not available
java.io.IOException: File system is not available
        at
org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125)
        at
org.apache.hadoop.hbase.master.HMaster.checkFileSystem(HMaster.java:324)
        at
org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:525)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:197)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:585)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
        at
org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114)
        ... 3 more
2009-10-07 11:25:17,174 INFO org.apache.hadoop.hbase.master.HMaster:
Stopping infoServer



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message