hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charan kumar <charan.ku...@gmail.com>
Subject Region Servers Crashing during Random Reads
Date Thu, 03 Feb 2011 19:47:29 GMT
Hello,

 I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB
RAM)

I had 9 Region Servers crash (out of 30) in a span of 30 minutes during a
heavy reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me.
I did all recommended configuration from the Hbase wiki... Any other
suggestions?


2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew (promotion
failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
[CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]

2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew (promotion
failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
[CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]

2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew (promotion
failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
[CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]


The following is the log entry in region Server

2011-02-03 10:37:43,946 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 47172ms for sessionid
0x12db9f722421ce3, closing socket connection and attempting reconnect
2011-02-03 10:37:43,947 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 48159ms for sessionid
0x22db9f722501d93, closing socket connection and attempting reconnect
2011-02-03 10:37:44,401 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server XXXXXXXXXXXXXXXX
2011-02-03 10:37:44,402 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to XXXXXXXXX, initiating session
2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server XXXXXXXXXXXXXXX
2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to XXXXXXXXXXXXXXXXXXXXX, initiating session
2011-02-03 10:37:44,767 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 81.93 MB of total=696.25 MB
2011-02-03 10:37:44,784 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=81.94 MB, total=614.81 MB, single=379.98 MB, multi=309.77
MB, memory=0 KB
2011-02-03 10:37:45,205 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x22db9f722501d93 has expired,
closing socket connection
2011-02-03 10:37:45,206 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
This client just lost it's session with ZooKeeper, trying to reconnect.
2011-02-03 10:37:45,453 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Trying to reconnect to zookeeper
2011-02-03 10:37:45,206 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x12db9f722421ce3 has expired,
closing socket connection
gionserver:60020-0x22db9f722501d93 regionserver:60020-0x22db9f722501d93
received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:328)
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:246)
        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
handled exception: org.apache.hadoop.hbase.YouAreDeadException: Server
REPORT rejected; currently processing XXXXXXXXXXXX,60020,1296684296172 as
dead server
org.apache.hadoop.hbase.YouAreDeadException:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing XXXXXXXXXXXX,60020,1296684296172 as dead server
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
        at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:80)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:729)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:586)
        at java.lang.Thread.run(Thread.java:619)


2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew (promotion
failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
[CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]



Thanks,
Charan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message