hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file system not available
Date Wed, 07 Oct 2009 18:26:58 GMT
You can't allow swapping.

   - Andy




________________________________
From: Lucas Nazário dos Santos <nazario.lucas@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wed, October 7, 2009 10:05:44 AM
Subject: Re: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file  system not available

There was an exception in the namenode log by the time HBase crashed (see
bellow).

Anyway, machines in the cluster frequently need to do swap. It was the case
when HBase crashed last time.

Lucas



2009-10-07 11:24:55,049 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root
ip=/192.168.1.3 cmd=open
src=/usr/local/hadoop_data/hadoop-root/mapred/system/job_200910061432_0753/job.jar
dst=null        perm=null
2009-10-07 11:25:02,645 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root
ip=/192.168.1.3 cmd=open        src=/ninvest/feeds      dst=null
perm=null
2009-10-07 11:25:15,803 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root
ip=/192.168.1.2 cmd=open
src=/usr/local/hadoop_data/hadoop-root/mapred/system/job_200910061432_0753/job.xml
dst=null        perm=null
2009-10-07 11:25:15,853 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root
ip=/192.168.1.2 cmd=open
src=/usr/local/hadoop_data/hadoop-root/mapred/system/job_200910061432_0753/job.jar
dst=null        perm=null
2009-10-07 11:25:17,171 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 9000: readAndProcess threw exception java.io.IOException:
Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
        at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
2009-10-07 11:25:18,629 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/server2,60020,1254853514050/hlog.dat.1254925491305.
blk_-6881522694925209803_244918
2009-10-07 11:25:18,634 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.1.2:50010 is added to
blk_-6881522694925209803_244918 size 119
2009-10-07 11:25:18,634 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.1.3:50010 is added to
blk_-6881522694925209803_244918 size 119
2009-10-07 11:25:18,636 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
728 Total time for transactions(ms): 20Number of transactions batched in
Syncs: 32 Number of syncs: 581 SyncTimes(ms): 12750
2009-10-07 11:31:28,581 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_5038589320720026567 is added to invalidSet
of 192.168.1.3:50010






On Wed, Oct 7, 2009 at 1:58 PM, Andrew Purtell <apurtell@apache.org> wrote:

> Looks like your DFS NameNode became unavailable about the same time that
> ZooKeeper timeouts started happening. Overloading? Anything relevant in the
> NameNode logs?
>
>   - Andy
>
>
>
>
> ________________________________
> From: Lucas Nazário dos Santos <nazario.lucas@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wed, October 7, 2009 9:43:49 AM
> Subject: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file
>  system not available
>
> Hello,
>
> My HBase cluster crashed today after a couple of days running and the logs
> show the exception bellow (end of the message).
>
> Some log excerpts that took my attention are:
>
> 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster:
> Master
> lost its znode, killing itself now
> 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
> Shutting down HBase cluster: file system not available
>
> Any clue on what happened? What could I do to prevent this from occurring
> in
> the future?
>
> Thanks!
> Lucas
>
>
>
> 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 9 row(s) of meta region {server:
> 192.168.1.3:60020, regionname: .META.,,1, startKey: <>} complete
> 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner:
> All
> 1 .META. region(s) scanned
> 2009-10-07 11:25:06,311 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x1242b188e8a0001 to sun.nio.ch.SelectionKeyImpl@148c02f
> java.io.IOException: TIMED OUT
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server server2/192.168.1.3:2181
> 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected local=/
> 192.168.1.3:49602 remote=server2/192.168.1.3:2181]
> 2009-10-07 11:25:06,703 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2009-10-07 11:25:16,911 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x242b1890c70000 to sun.nio.ch.SelectionKeyImpl@1060478
> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
> lim=4 cap=4]
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> 2009-10-07 11:25:16,911 INFO org.apache.hadoop.hbase.master.ServerManager:
> server2,60020,1254853514050 znode expired
> 2009-10-07 11:25:17,021 INFO org.apache.hadoop.hbase.master.RegionManager:
> META region removed from onlineMetaRegions
> 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster:
> Master
> lost its znode, killing itself now
> 2009-10-07 11:25:17,032 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server server2,60020,1254853514050: logSplit: false, rootRescanned: false,
> numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
> 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster:
> Shutting down HBase cluster: file system not available
> java.io.IOException: File system is not available
>        at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125)
>        at
> org.apache.hadoop.hbase.master.HMaster.checkFileSystem(HMaster.java:324)
>        at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:525)
>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)
> Caused by: java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:197)
>        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:585)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
>        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
>        at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114)
>        ... 3 more
> 2009-10-07 11:25:17,174 INFO org.apache.hadoop.hbase.master.HMaster:
> Stopping infoServer
>
>
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message