hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Xiang <jxi...@cloudera.com>
Subject Re: could not start HMaster
Date Mon, 15 Oct 2012 20:32:18 GMT
Is your /tmp folder cleaned up automatically and some files are gone?

Thanks,
Jimmy

On Mon, Oct 15, 2012 at 12:26 PM,  <Yuling_C@dell.com> wrote:
> Hi,
>
> I set up a single node HBase server on top of Hadoop and it has been working fine with
most of my testing scenarios such as creating tables and inserting data. Just during the weekend,
I accidentally left a testing script running that inserts about 67 rows every min for three
days. Today when I looked at the environment, I found out that HBase master could not be started
anymore. Digging into the logs, I could see that starting from the second day, HBase first
got an exception as follows:
>
> 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992,
entries=7981, filesize=3754556.  for /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364
> 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old
hlog file /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
whose highest sequenceid is 4 to /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
> 2012-10-13 13:05:07,379 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller
> java.io.FileNotFoundException: File file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442
does not exist.
>        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213)
>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163)
>        at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287)
>        at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428)
>        at org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825)
>        at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708)
>        at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603)
>        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
>        at java.lang.Thread.run(Thread.java:662)
>
> Then SplitLogManager kept splitting the logs for about two days:
> 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of
stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 0x139ff3656b30003,
likely client has closed socket
>        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
>        at java.lang.Thread.run(Thread.java:662)
> 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket
connection for client /127.0.0.1:52573 which had sessionid 0x139ff3656b30003
> 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2012-10-13 13:05:09,085 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Splitting logs for sflow-linux02.santanet.dell.com,47137,1348606516541
> 2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog
worker sflow-linux02.santanet.dell.com,47137,1348606516541
> 2012-10-13 13:05:09,101 INFO org.apache.hadoop.hbase.master.SplitLogManager: started
splitting logs in [file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541-splitting]
> 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker
closing leases
> 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker
closed leases
> 2012-10-13 13:08:09,275 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000028
entered state done sflow-linux02.santanet.dell.com,37015,1348606516151
> 2012-10-13 13:11:09,730 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000029
entered state done sflow-linux02.santanet.dell.com,37015,1348606516151
> 2012-10-13 13:14:10,171 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000030
entered state done sflow-linux02.santanet.dell.com,37015,1348606516151
>
> When I tried to re-start HBase server today, the following exception occurs:
> 2012-10-15 11:54:10,122 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x13a65c6a8090002, negotiated
timeout = 40000
> 2012-10-15 11:54:10,124 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0
orphan tasks and 0 rescan nodes
> 2012-10-15 11:54:10,238 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception.
Starting shutdown.
> org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to be upgraded.
 You have version null and I want version 7.  Run the '${HBASE_HOME}/bin/hbase migrate' script.
>        at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:245)
>        at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:347)
>        at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
>
>
> Just wondering what happened and is there any way to recover from this situation? Is
re-installation of HBase my only choice at this moment?
>
> Thanks very much,
>
> YuLing

Mime
View raw message