hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1123) Server never leaves the dead list though logs have all been processed if crashed server had -ROOT- (seemingly)
Date Mon, 26 Jan 2009 20:13:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667384#action_12667384
] 

Jim Kellerman commented on HBASE-1123:
--------------------------------------

On hbase-0.19 branch, I could not reproduce this. I killed the server holding root while cluster
was under load
and it exited the waiting state in 1:01 (min:secs):

{code}
2009-01-26 19:26:32,266 INFO org.apache.hadoop.hbase.master.RegionManager: assigning region
-ROOT-,,0 to server 208.76.44.141:8020

2009-01-26 19:41:08,396 INFO org.apache.hadoop.hbase.master.ServerManager: 208.76.44.141:8020
lease expired
2009-01-26 19:42:10,757 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation: Removed
208.76.44.141:8020 from deadservers Map
{code}

I then waited for the cluster to rebalance, again put it under load, and killed the server
holding the root region.
It took a little longer (2 min 19 sec) before the server was removed from the dead list.

{code}
2009-01-26 19:41:11,808 INFO org.apache.hadoop.hbase.master.RegionManager: assigning region
-ROOT-,,0 to server 208.76.44.139:8020

2009-01-26 19:49:01,966 INFO org.apache.hadoop.hbase.master.ServerManager: 208.76.44.139:8020
lease expired
2009-01-26 19:51:20,354 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation: Removed
208.76.44.139:8020 from deadservers Map
{code}

However, if leases included the start code, we could have put the restarted server back into
service much sooner, as it
would not interfere with the splitting of logs (which include the start code in their name).



> Server never leaves the dead list though logs have all been processed if crashed server
had -ROOT- (seemingly)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1123
>                 URL: https://issues.apache.org/jira/browse/HBASE-1123
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>         Attachments: 1123.patch
>
>
> Cluster is just hung after host that had -ROOT- completed splitting its logs... old server
is just stuck on the dead list and never comes off it.
> {code}
> ..
> 2009-01-13 01:09:36,448 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting
6 of 6: hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020/hlog.dat.1231718928939
> 2009-01-13 01:09:37,396 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:38,591 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating
new log file writer for path hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/712889985/oldlogfile.log
and region TestTable,0040922294,1231559109829
> 2009-01-13 01:09:38,670 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating
new log file writer for path hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/484208094/oldlogfile.log
and region TestTable,0042007133,1231628296909
> 2009-01-13 01:09:45,096 [HMaster] INFO org.apache.hadoop.hbase.regionserver.HLog: log
file splitting completed for hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020
> 2009-01-13 01:09:47,317 [SocketListener0-2] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:47,416 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:47,518 [IPC Server handler 3 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region -ROOT-,,0 to server XX.XX.XX141:60020
> 2009-01-13 01:09:49,007 [IPC Server handler 6 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 430, Num Servers: 3, Avg Load: 144.0
> 2009-01-13 01:09:50,219 [SocketListener0-0] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from XX.XX.XX.141:60020
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: -ROOT-,,0 from 208.76.44.141:60020
> 2009-01-13 01:09:50,719 [SocketListener0-3] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:50,967 [SocketListener0-4] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:52,117 [SocketListener0-5] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> ....
> 2009-01-13 01:09:57,426 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX.142:60020 removal from dead list before processing report-for-duty request
> ....
> 2009-01-13 01:10:45,156 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing
todo: ProcessServerShutdown of XX.XX.XX142:60020
> 2009-01-13 01:10:45,156 [HMaster] INFO org.apache.hadoop.hbase.master.RegionServerOperation:
process shutdown of server XX.XX.XX.142:60020: logSplit: true, rootRescanned: false, numberOfMetaRegions:
1, onlineMetaRegions.size(): 1
> 2009-01-13 01:10:45,156 [HMaster] DEBUG org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanRootRegion:
process server shutdown scanning root region on XX.XX.XX.141
> 2009-01-13 01:10:45,182 [HMaster] DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
process server shutdown scanning root region on XX.XX.XX.141 finished HMaster
> 2009-01-13 01:10:45,183 [HMaster] DEBUG org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions:
process server shutdown scanning .META.,,1 on XX.XX.XX.142:60020
> 2009-01-13 01:10:47,496 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX.142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:10:49,320 [IPC Server handler 8 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 431, Num Servers: 3, Avg Load: 144.0
> .....
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message