hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject HBase went down
Date Fri, 11 Jun 2010 05:46:00 GMT
Hi,
We use hbase 0.20.3
HBase went down when we ran map-reduce job.

Your comment / help is appreciated.

I didn't find such WARN in master or region server log:
We slept 117947ms, ten times longer than scheduled: 10000

Here is master log snippet:

2010-06-11 04:32:50,077 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
T3_HBASE_NO_INCR_ORCL_SQLLDR-THREEGPPSPEECHCALLS-1276229903001-0,BCBF1C0DB6FCE3695709CC6175CF512D,1276230743333
in region .META.,,1 with startcode=1276204602423, server=10.201.8.208:60020
2010-06-11 04:33:16,002 INFO org.apache.hadoop.hbase.master.ServerManager:
sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 znode expired
2010-06-11 04:33:16,003 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2010-06-11 04:33:16,003 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: false,
rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-06-11 04:33:16,005 INFO org.apache.hadoop.hbase.regionserver.HLog:
Splitting 3 hlog(s) in hdfs://
sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
2010-06-11 04:33:16,098 WARN org.apache.hadoop.hbase.regionserver.HLog:
Empty hlog, continuing: org.apache.hadoop.fs.FileStatus@bc89a24a count=0
java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:962)
    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872)
    at
org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286)
    at
org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.regionserver.HLog: hlog
file splitting completed in 1020 millis for hdfs://
sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
2010-06-11 04:33:17,024 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
meta reassignment and scanning:
2010-06-11 04:33:17,024 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
reassigning ROOT region
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
ROOT inserted into regionsInTransition
2010-06-11 04:33:17,064 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region -ROOT-,,0 to sjc1-hadoop6.sjc1.carrieriq.com
,60020,1276204602430
2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from
sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; 1 of 2
2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: -ROOT-,,0 from
sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430;
2 of 2
2010-06-11 04:33:25,891 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 10.201.8.208:60020,
regionname: .META.,,1, startKey: <>}
2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 161 row(s) of meta region {server:
10.201.8.208:60020, regionname: .META.,,1, startKey: <>} complete
2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2010-06-11 04:33:29,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60000, call regionServerReport(address: 10.201.8.205:60020,
startcode: 1276204602195, load: (requests=19880, regions=24, usedHeap=533,
maxHeap=3983), [Lorg.apache.hadoop.hbase.HMsg;@6c9a819d,
[Lorg.apache.hadoop.hbase.HRegionInfo;@13b3a21f) from 10.201.8.205:60755:
error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
    at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:252)
    at
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:724)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:40,350 INFO org.apache.hadoop.hbase.master.ServerManager: 5
region servers, 1 dead, average load 27.0[sjc1-hadoop1.sjc1.carrieriq.com
,60020,1276204602195]
2010-06-11 04:33:45,892 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 10.201.8.210:60020,
regionname: -ROOT-,,0, startKey: <>}
2010-06-11 04:34:17,044 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: true,
rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-06-11 04:34:17,044 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
meta reassignment and scanning:
2010-06-11 04:34:24,195 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
10.201.8.210:60020, regionname: -ROOT-,,0, startKey: <>} complete

10.201.8.205 is one of the region servers which went down.
Here is region server log from 10.201.8.205

2010-06-11 04:32:42,734 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:32:42,734 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:32:43,441 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561/1645281535
available; sequence id is 342755
2010-06-11 04:32:43,515 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:33:10,793 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@69267649
java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2010-06-11 04:33:10,895 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181
2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.201.8.205:31261 remote=sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181]
2010-06-11 04:33:29,677 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2010-06-11 04:33:29,678 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Expired, type: None, path: null
2010-06-11 04:33:29,678 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@32bc4106
java.io.IOException: Session Expired
    at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2010-06-11 04:33:29,678 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
expired
2010-06-11 04:33:29,678 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020, call put([B@2ba504e0,
[Lorg.apache.hadoop.hbase.client.Put;@7b420664) from 10.201.8.205:24720:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020, call put([B@1d014ece,
[Lorg.apache.hadoop.hbase.client.Put;@40dcc542) from 10.201.8.210:3754:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020, call put([B@2d4ed3d0,
[Lorg.apache.hadoop.hbase.client.Put;@123413ed) from 10.201.8.210:3755:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

Later, I restarted region server but I saw:

2010-06-11 05:05:34,087 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed TED-THREEGPPSPEECHCALLS-1276213362068-17,,1276213463847
2010-06-11 05:05:34,087 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_CLOSE: MetaTable-15,,1276213160085: Overloaded
2010-06-11 05:05:34,088 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed MetaTable-15,,1276213160085
2010-06-11 05:06:07,818 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 05:06:07,820 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020, call getClosestRowBefore([B@51f88fbd, [B@15a4eb56,
[B@7e566633) from 10.201.8.212:2687: error:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message