hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-17055) Disabling table not getting enabled after clean cluster restart.
Date Wed, 09 Nov 2016 23:29:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652370#comment-15652370
] 

Stephen Yuan Jiang edited comment on HBASE-17055 at 11/9/16 11:29 PM:
----------------------------------------------------------------------

The title "Disabling table not getting enabled after clean cluster restart." is misleading.
 The table did get enabled; just the one region of the table is still offline.  

For EnableTable operation, the logic does not change in all 1.x releases.  It is best effort
to online region (the logic is there before the logic moves to procedure), so it is doing
the right thing (based on code logic) if some region is not online.  The table still declares
it is online.  This part is not a problem.

[~sreenivasulureddy], my question is that {{master.AssignmentManager: Skip assigning testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10.,
it is on a dead but not processed yet server: host-1,16040,1478690163056}} - does the ServerShutdownHandler
procedure ever run to move the regions of {{host-1}} to other RS?  If the SSH did run and
skip this region, we probably have a corner case bug here.  Please either attach relevant
log for host-1 SSH or share out the master log in some places.




was (Author: syuanjiang):
The title "Disabling table not getting enabled after clean cluster restart." is misleading.
 The table did get enabled; just the one region of the table is still offline.  

For EnableTable operation, the logic does not change in all 1.x releases.  It is best effort
to online region (the logic is there before the logic moves to procedure), so it is doing
the right thing (based on code logic) if some region is not online.  The table still declares
it is online.  This part is not a problem.

[~sreenivasulureddy], my question is that {{master.AssignmentManager: Skip assigning testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10.,
it is on a dead but not processed yet server: host-1,16040,1478690163056}} - does the ServerShutdownHandler
procedure ever run on {{host-1}}.  If the SSH did run and skip this region, we probably have
a corner case bug here.  Please either attach relevant log for host-1 SSH or share out the
master log in some places.



> Disabling table not getting enabled after clean cluster restart.
> ----------------------------------------------------------------
>
>                 Key: HBASE-17055
>                 URL: https://issues.apache.org/jira/browse/HBASE-17055
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 1.3.0
>            Reporter: Y. SREENIVASULU REDDY
>             Fix For: 1.3.0
>
>
> scenario:
> 1. Disable the table, while disabling the table is in progress.
> 2. Restart whole HBase service.
> 3. Then enable the table.
> the above operation leads to RIT continously.
> pls find the below logs for understanding.
> while disabling the table whole hbase service went down.
> the following is the master logs
> {noformat}
> 2016-11-09 19:32:55,102 INFO  [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.HMaster: Client=seenu//host-1 disable testTable
> 2016-11-09 19:32:55,257 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
procedure2.ProcedureExecutor: Procedure DisableTableProcedure (table=testTable) id=8 owner=seenu
state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2016-11-09 19:32:55,264 DEBUG [ProcedureExecutor-5] lock.ZKInterProcessLockBase: Acquired
a lock for /hbase/table-lock/testTable/write-master:160000000000005
> 2016-11-09 19:32:55,285 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=8
> 2016-11-09 19:32:55,386 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=8
> 2016-11-09 19:32:55,513 INFO  [ProcedureExecutor-5] zookeeper.ZKTableStateManager: Moving
table testTable state from DISABLING to DISABLING
> 2016-11-09 19:32:55,587 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=8
> 2016-11-09 19:32:55,628 INFO  [ProcedureExecutor-5] procedure.DisableTableProcedure:
Offlining 1 regions.
> .
> .
> .
> .
> .
> .
> .
> .
> 2016-11-09 19:33:02,871 INFO  [AM.ZK.Worker-pool2-t7] master.RegionStates: Offlined 1890fa9c085dcc2ee0602f4bab069d10
from host-1,16040,1478690163056
> Wed Nov  9 19:33:02 CST 2016 Terminating master
> {noformat}
> here we need to observe
> {color:red} Offlined 1890fa9c085dcc2ee0602f4bab069d10 from host-1,16040,1478690163056
{color}
> then hmaster went down, all regionServers also made down.
> After hmaster and regionserver are restarted
> executed enable Table operation on the table.
> {panel:title=HMaster Logs|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#FFFFCE}
> {noformat}
> 2016-11-09 19:49:57,059 INFO  [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.HMaster: Client=seenu//host-1 enable testTable
> 2016-11-09 19:49:57,325 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
procedure2.ProcedureExecutor: Procedure EnableTableProcedure (table=testTable) id=9 owner=seenu
state=RUNNABLE:ENABLE_TABLE_PREPARE added to the store.
> 2016-11-09 19:49:57,333 DEBUG [ProcedureExecutor-2] lock.ZKInterProcessLockBase: Acquired
a lock for /hbase/table-lock/testTable/write-master:160000000000008
> 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] ipc.RpcClientImpl:
Use SIMPLE authentication for service ClientService, sasl=false
> 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] ipc.RpcClientImpl:
Connecting to host-1:16040
> 2016-11-09 19:49:57,347 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:49:57,449 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:49:57,579 INFO  [ProcedureExecutor-2] procedure.EnableTableProcedure: Attempting
to enable the table testTable
> 2016-11-09 19:49:57,580 INFO  [ProcedureExecutor-2] zookeeper.ZKTableStateManager: Moving
table testTable state from DISABLED to ENABLING
> 2016-11-09 19:49:57,655 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:49:57,707 INFO  [ProcedureExecutor-2] procedure.EnableTableProcedure: Table
'testTable' has 1 regions, of which 1 are offline.
> 2016-11-09 19:49:57,707 INFO  [ProcedureExecutor-2] procedure.EnableTableProcedure: Bulk
assigning 1 region(s) across 1 server(s), retainAssignment=true
> 2016-11-09 19:49:57,710 DEBUG [ProcedureExecutor-2] master.GeneralBulkAssigner: Timeout-on-RIT=91000
> 2016-11-09 19:49:57,710 INFO  [host-1,16000,1478691456965-GeneralBulkAssigner-0] master.AssignmentManager:
Assigning 1 region(s) to host-1,16040,1478691644081
> {noformat}
> .
> .
> .
> {color:red}
> 2016-11-09 19:49:57,718 DEBUG [AM.-pool1-t1] master.AssignmentManager: Force region state
offline {1890fa9c085dcc2ee0602f4bab069d10 state=OFFLINE, ts=1478692197716, server=host-1,16040,1478690163056}
> 2016-11-09 19:49:57,722 INFO  [AM.-pool1-t1] master.AssignmentManager: Skip assigning
testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10., it is on a dead but not processed
yet server: host-1,16040,1478690163056
> {color}
> {noformat}
> 2016-11-09 19:49:57,957 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:49:58,459 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:49:59,462 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> {noformat}
> .
> .
> .
> {noformat}
> 2016-11-09 19:51:38,720 DEBUG [ProcedureExecutor-2] master.GeneralBulkAssigner: bulk
assigning total 1 regions to 1 servers, took 101010ms, with 1 regions still in transition
> 2016-11-09 19:51:38,728 DEBUG [ProcedureExecutor-2] procedure.EnableTableProcedure: Skipping
assign for the region {ENCODED => 1890fa9c085dcc2ee0602f4bab069d10, NAME => 'testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10.',
STARTKEY => '', ENDKEY => ''} during enable table testTable because its already in tranition
or assigned.
> 2016-11-09 19:51:38,728 INFO  [ProcedureExecutor-2] procedure.EnableTableProcedure: Table
'testTable' has 1 regions, of which 0 are offline.
> 2016-11-09 19:51:38,840 INFO  [ProcedureExecutor-2] zookeeper.ZKTableStateManager: Moving
table testTable state from ENABLING to ENABLED
> {noformat}
> {color:red}
> 2016-11-09 19:51:38,846 INFO  [ProcedureExecutor-2] procedure.EnableTableProcedure: Table
'testTable' was successfully enabled. 
> {color}
> {noformat}
> 2016-11-09 19:51:39,081 DEBUG [ProcedureExecutor-2] lock.ZKInterProcessLockBase: Released
/hbase/table-lock/testTable/write-master:160000000000008
> 2016-11-09 19:51:39,081 DEBUG [ProcedureExecutor-2] procedure2.ProcedureExecutor: Procedure
completed in 1mins, 41.898sec: EnableTableProcedure (table=testTable) id=9 owner=seenu state=FINISHED
> 2016-11-09 19:51:45,485 DEBUG [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000]
master.MasterRpcServices: Checking to see if procedure is done procId=9
> 2016-11-09 19:53:08,485 DEBUG [RpcServer.reader=4,bindAddress=host-1,port=16000] ipc.RpcServer:
RpcServer.listener,port=16000: DISCONNECTING client host-1:40313 because read count=-1. Number
of active connections: 3
> 2016-11-09 19:53:45,504 DEBUG [RpcServer.reader=3,bindAddress=host-1,port=16000] ipc.RpcServer:
RpcServer.listener,port=16000: DISCONNECTING client host-1:40281 because read count=-1. Number
of active connections: 2
> {noformat}
> {color:red}
> 2016-11-09 19:55:48,650 DEBUG [host-1,16000,1478691456965_ChoreService_1] master.HMaster:
Not running balancer because 1 region(s) in transition: {1890fa9c085dcc2ee0602f4bab069d10={1890fa9c085dcc2ee0602f4bab069d10
state=OFFLINE, ts=1478692197716, server=host-1,16040,1478690163056}}
> {color}
> {noformat}
> 2016-11-09 19:55:48,650 WARN  [host-1,16000,1478691456965_ChoreService_1] master.CatalogJanitor:
CatalogJanitor disabled! Not running scan.
> 2016-11-09 19:57:09,336 DEBUG [ProcedureExecutorTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner:
Evict completed procedure: Procedure=EnableTableProcedure (table=testTable) (id=9, owner=seenu,
state=FINISHED, startTime=7mins, 12sec ago, lastUpdate=7mins, 12sec ago)
> {noformat}
> .
> .
> .
> {color:red}
> 2016-11-09 20:10:48,408 DEBUG [host-1,16000,1478691456965_ChoreService_1] master.HMaster:
Not running balancer because 1 region(s) in transition: {1890fa9c085dcc2ee0602f4bab069d10={1890fa9c085dcc2ee0602f4bab069d10
state=OFFLINE, ts=1478692197716, server=host-1,16040,1478690163056}}
> {color}
> {panel}
> region is continuously is in RIT, because new hmaster is trying to make the region offline
request sending to dead regionserver. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message