hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-17570) rsgroup server move can get stuck if unassigning fails
Date Fri, 03 Feb 2017 00:13:51 GMT

     [ https://issues.apache.org/jira/browse/HBASE-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack resolved HBASE-17570.
---------------------------
    Resolution: Duplicate

Fixed by HBASE-17350

> rsgroup server move can get stuck if unassigning fails
> ------------------------------------------------------
>
>                 Key: HBASE-17570
>                 URL: https://issues.apache.org/jira/browse/HBASE-17570
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: stack
>             Fix For: 2.0.0
>
>
> This is pretty easy to repro in a standalone setup on master branch. Master branch has
the 'fake' Master regionserver. It is showing as a regionserver in the rsgroup 'default' group.
If I create a new group and then try moving servers to the new group, it will usually get
stuck in the below loop... and it will never break out (have to kill master).
> Looking at code, the RSGroupAdminServer#moveServers has a loop in it that will just go
on for ever; there is no timeout nor maximum tries.
> Maybe we don't see this much in a 'real' cluster. Filing this issue in meantime because
needs to not keep trying for ever and fail the move.
> {code}
> 2017-01-30 21:34:46,340 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
rsgroup.RSGroupAdminServer: Unassigning 1 regions from server localhost:50143 for move to
xx
> 2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OPEN, ts=1485840806167,
server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE,
ts=1485840886341, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
with state=PENDING_CLOSE
> 2017-01-30 21:34:46,347 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50143]
regionserver.RSRpcServices: Close 8ebaa5bd7a2e906429a7b91bb2bee333 without moving
> 2017-01-30 21:34:46,348 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion:
Flushing 1/1 column families, memstore=431 B
> 2017-01-30 21:34:46,406 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.DefaultStoreFlusher:
Flushed, sequenceid=7, memsize=431, hasBloomFilter=true, into tmp file file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/.tmp/m/999d93adf36b4406bb73dc64e0158a05
> 2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HStore:
Added file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/m/999d93adf36b4406bb73dc64e0158a05,
entries=2, sequenceid=7, filesize=4.9 K
> 2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion:
Finished memstore flush of ~431 B/431, currentsize=0 B/0 for region hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
in 74ms, sequenceid=7, compaction requested=false
> 2017-01-30 21:34:46,425 INFO  [StoreCloserThread-hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.-1]
regionserver.HStore: Closed m
> 2017-01-30 21:34:46,437 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion:
Closed hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
> 2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141]
master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341,
server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440,
server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141]
master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
with state=CLOSED
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] balancer.BaseLoadBalancer: Wanted to do
retain assignment but no servers to assign to
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Can't find a destination
for 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Unable to determine
a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333, NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.',
STARTKEY => '', ENDKEY => ''}
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.RegionStates: Failed to open/close
8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161, set to FAILED_OPEN
> 2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333
state=CLOSED, ts=1485840886440, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333
state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStateStore: Updating hbase:meta
row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=FAILED_OPEN
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory:
Accepted socket connection from /0:0:0:0:0:0:0:1:50272
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer:
Refusing session request for client /0:0:0:0:0:0:0:1:50272 as it has seen zxid 0x25e our last
zxid is 0xae client must try another server
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn:
Closed socket connection for client /0:0:0:0:0:0:0:1:50272 (no session established for client)
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
rsgroup.RSGroupAdminServer: Unassigning 2 regions from server localhost:50143 for move to
xx
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840886442,
server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE,
ts=1485840887353, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
with state=OFFLINE
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to assign to
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.AssignmentManager: Can't find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333,
NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY =>
'', ENDKEY => ''}
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161,
set to FAILED_OPEN
> 2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353,
server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN,
ts=1485840887355, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
with state=FAILED_OPEN
> 2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355,
server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE,
ts=1485840887356, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
with state=OFFLINE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message