hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Nazário dos Santos <nazario.lu...@gmail.com>
Subject Region server going down
Date Fri, 16 Oct 2009 20:29:53 GMT
Hi,

Today one regionserver crashed and I can't figure out why. Everything
started with the message "server,60020,1255644477834 znode expired". I'm
still running the cluster on little memory and swap is getting in my way
from time to time (it's rare but I need to fix it). Can it be the cause of
the error bellow? Do you think that five minutes is enough for the property
zookeeper.session.timeout? Why the message "wrong key class:
org.apache.hadoop.hbase.regionserver.HLogKey is not class"?

My tests show that whenever zookeeper "shakes" the whole cluster goes down.
Shouldn't HBase be more robust regarding Zookeeper? Something like a retry
strategy...

Lucas



2009-10-16 15:07:32,167 INFO org.apache.hadoop.hbase.master.ServerManager: 2
region servers, 0 dead, average load 7.0
2009-10-16 15:07:32,537 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 192.168.1.2:60020,
regionname: -ROOT-,,0, startKey: <>}
2009-10-16 15:07:32,560 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
192.168.1.2:60020, regionname: -ROOT-,,0, startKey: <>} complete
2009-10-16 15:07:32,654 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 192.168.1.3:60020,
regionname: .META.,,1, startKey: <>}
2009-10-16 15:07:32,804 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 12 row(s) of meta region {server:
192.168.1.3:60020, regionname: .META.,,1, startKey: <>} complete
2009-10-16 15:07:32,804 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-10-16 15:08:09,551 INFO org.apache.hadoop.hbase.master.ServerManager:
server,60020,1255644477834 znode expired
2009-10-16 15:08:09,605 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2009-10-16 15:08:09,605 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server server,60020,1255644477834: logSplit: false, rootRescanned: false,
numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2009-10-16 15:08:09,623 INFO org.apache.hadoop.hbase.regionserver.HLog:
Splitting 20 hlog(s) in
hdfs://server2:9000/hbase/.logs/server,60020,1255644477834
2009-10-16 15:08:09,841 WARN org.apache.hadoop.hbase.regionserver.HLog:
Exception processing
hdfs://server2:9000/hbase/.logs/server,60020,1255644477834/hlog.dat.1255644478353
-- continuing. Possible DATA LOSS!
java.io.IOException: wrong key class:
org.apache.hadoop.hbase.regionserver.HLogKey is not class
org.apache.hadoop.hbase.regionserver.transactional.THLogKey
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:896)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802)
        at
org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274)
        at
org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:490)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:425)
2009-10-16 15:08:09,870 WARN org.apache.hadoop.hbase.regionserver.HLog:
Exception processing
hdfs://server2:9000/hbase/.logs/server,60020,1255644477834/hlog.dat.1255648058463
-- continuing. Possible DATA LOSS!
java.io.IOException: wrong key class:
org.apache.hadoop.hbase.regionserver.HLogKey is not class
org.apache.hadoop.hbase.regionserver.transactional.THLogKey
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:896)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802)
        at
org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274)
        at
org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:490)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:425)
2009-10-16 15:08:09,886 WARN org.apache.hadoop.hbase.regionserver.HLog:
Exception processing hdfs://server2:9000/hbase/.logs/server,60020,12556

// More wrong key class errors...

2009-10-16 15:08:10,203 INFO org.apache.hadoop.hbase.regionserver.HLog: hlog
file splitting completed in 594 millis for
hdfs://server2:9000/hbase/.logs/server,60020,1255644477834
2009-10-16 15:08:10,203 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
meta reassignment and scanning:
2009-10-16 15:08:10,203 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
reassigning ROOT region
2009-10-16 15:08:10,203 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2009-10-16 15:08:10,203 INFO org.apache.hadoop.hbase.master.RegionManager:
ROOT inserted into regionsInTransition
2009-10-16 15:08:32,167 INFO org.apache.hadoop.hbase.master.ServerManager: 1
region servers, 1 dead, average load 6.0[server,60020,1255644477834]

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message