hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张磊 <zhang...@youku.com>
Subject one RegionServer crashed and the whole cluster was blocked
Date Thu, 18 Oct 2012 11:30:48 GMT
Hi, All

  One of the RegionServer of our company’s cluster was crashed. At this
time, I found:

1.       All the RegionServer stopped handling the requests from the client
side( requestsPerSecond=0 at the master-status UI page).

2.       It takes about 12-15 minutes to recovery.

3.       I have set hbase.regionserver.restart.on.zk.expire to true, but it
does not work.

  For 1, I knew the cluster began to split log and recover the data on the
crashed RegionServer, will the recovery operation block all the requests
from the client side?

  For 2, Is there any solution to reduce the recovery time?

  For 3, I checked the log, found “session is timeout” exception, maybe
for full gc and the session was timeout. But why the
hbase.regionserver.restart.on.zk.expire does not work? My HBase version is


  Thanks for any suggestions and feedback!


Fowler Zhang


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message