hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liushaohui <liushao...@xiaomi.com>
Subject Why not a restart region server serve the WAL logs the last RS Write?
Date Wed, 16 Jan 2013 11:26:37 GMT
Dear HBase Devs,

When I restart the hbase cluster,  all region servers ' WAL logs will be 
splitted despite of all the region servers start immediately.

 From the master code,  I found that the hbase master label each region 
server with ip,port,start-time

and from the view of master , hbase master think the new region server 
with same ip and port is different from the old region server and put

the old region server's logs to the split queue.  When the cluster have 
about 500 regions, it usually takes 2 or 4 minutes to make all regions 
online.


Why not make the restart region server serve the old WAL logs to prevent 
log splits to reduce recovery time?

There  is the graceful rs-stop script, which make the region server 
flush the memstores, close the regions and detete WAL logs before stop.

But how to reduce recover time and prevent unnecessarily log splits when 
the power of rack or a datacenter is down?

Here are logs:

2013-01-16 15:08:32,842 INFO 
org.apache.hadoop.hbase.master.ServerManager:Registering 
server=sd-ml-hadoop23.bj,11600,1358320047485
2013-01-16 15:08:32,842 INFO 
org.apache.hadoop.hbase.master.ServerManager: Registering 
server=sd-ml-hadoop26.bj,11600,1358320078576
2013-01-16 15:08:32,842 INFO 
org.apache.hadoop.hbase.master.ServerManager: Registering 
server=sd-ml-hadoop25.bj,11600,1358320068311
2013-01-16 15:08:32,842 INFO 
org.apache.hadoop.hbase.master.ServerManager: Registering 
server=sd-ml-hadoop24.bj,11600,1358320057835
2013-01-16 15:08:32,845 WARN org.apache.hadoop.conf.Configuration: 
fs.default.name is deprecated. Instead, use fs.defaultFS
2013-01-16 15:08:32,891 INFO 
org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers 
count to settle; currently checked in 4, slept for 351 ms, expecting 
minimum of 1, maximum of 2147483647, timeout of 10000 ms, interval of 
1500 ms.
2013-01-16 15:08:34,395 INFO 
org.apache.hadoop.hbase.master.ServerManager: Finished waiting for 
region servers count to settle; checked in 4, slept for 1854 ms, 
expecting minimum of 1, maximum of 2147483647, master is running.
2013-01-16 15:08:34,398 INFO 
org.apache.hadoop.hbase.master.MasterFileSystem: Log folder 
hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358152154355 
doesn't belong to a known region server, splitting
2013-01-16 15:08:34,398 INFO 
org.apache.hadoop.hbase.master.MasterFileSystem: Log folder 
hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358320047485 
belongs to an existing region server

-Shaohui Liu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message