hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1123) Server never leaves the dead list though logs have all been processed if crashed server had -ROOT- (seemingly)
Date Mon, 19 Jan 2009 18:21:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665177#action_12665177
] 

Jim Kellerman commented on HBASE-1123:
--------------------------------------

A dead server is not removed from the deadServer list until ProcessServerShutdown has split
the logs and
scanned all the meta regions for regions that the dead server was serving.

The root region will get reassigned immediately after the log is split (if the dead server
was serving the
root region).

If the dead server was serving other regions, ProcessServerShutdown is requeued to wait for
the root
region to come back on-line.

Once the root region is back on-line, it can be scanned for any meta regions that the dead
server was
serving. If there are any, they are marked as unassigned and ProcessServerShutdown is requeued
to
wait for all the meta regions to be on-line.

Once all the meta regions are on-line, they can be scanned and any that were being served
by the
dead server are marked as unassigned.

At this point, the dead server is removed from the dead server list.

> Server never leaves the dead list though logs have all been processed if crashed server
had -ROOT- (seemingly)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1123
>                 URL: https://issues.apache.org/jira/browse/HBASE-1123
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Cluster is just hung after host that had -ROOT- completed splitting its logs... old server
is just stuck on the dead list and never comes off it.
> {code}
> ..
> 2009-01-13 01:09:36,448 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting
6 of 6: hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020/hlog.dat.1231718928939
> 2009-01-13 01:09:37,396 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:38,591 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating
new log file writer for path hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/712889985/oldlogfile.log
and region TestTable,0040922294,1231559109829
> 2009-01-13 01:09:38,670 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating
new log file writer for path hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/484208094/oldlogfile.log
and region TestTable,0042007133,1231628296909
> 2009-01-13 01:09:45,096 [HMaster] INFO org.apache.hadoop.hbase.regionserver.HLog: log
file splitting completed for hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020
> 2009-01-13 01:09:47,317 [SocketListener0-2] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:47,416 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:47,518 [IPC Server handler 3 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region -ROOT-,,0 to server XX.XX.XX141:60020
> 2009-01-13 01:09:49,007 [IPC Server handler 6 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 430, Num Servers: 3, Avg Load: 144.0
> 2009-01-13 01:09:50,219 [SocketListener0-0] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from XX.XX.XX.141:60020
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: -ROOT-,,0 from 208.76.44.141:60020
> 2009-01-13 01:09:50,719 [SocketListener0-3] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:50,967 [SocketListener0-4] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
region name .META.,,1
> 2009-01-13 01:09:52,117 [SocketListener0-5] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Cache hit for row <> in tableName .META.: location server XX.XX.XX.142:60020, location
region name .META.,,1
> ....
> 2009-01-13 01:09:57,426 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX.142:60020 removal from dead list before processing report-for-duty request
> ....
> 2009-01-13 01:10:45,156 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing
todo: ProcessServerShutdown of XX.XX.XX142:60020
> 2009-01-13 01:10:45,156 [HMaster] INFO org.apache.hadoop.hbase.master.RegionServerOperation:
process shutdown of server XX.XX.XX.142:60020: logSplit: true, rootRescanned: false, numberOfMetaRegions:
1, onlineMetaRegions.size(): 1
> 2009-01-13 01:10:45,156 [HMaster] DEBUG org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanRootRegion:
process server shutdown scanning root region on XX.XX.XX.141
> 2009-01-13 01:10:45,182 [HMaster] DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
process server shutdown scanning root region on XX.XX.XX.141 finished HMaster
> 2009-01-13 01:10:45,183 [HMaster] DEBUG org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions:
process server shutdown scanning .META.,,1 on XX.XX.XX.142:60020
> 2009-01-13 01:10:47,496 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Waiting on XX.XX.XX.142:60020 removal from dead list before processing report-for-duty request
> 2009-01-13 01:10:49,320 [IPC Server handler 8 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 431, Num Servers: 3, Avg Load: 144.0
> .....
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message