hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13635) Regions stuck in transition because master is incorrectly assumed dead
Date Wed, 06 May 2015 17:25:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530978#comment-14530978
] 

Elliott Clark commented on HBASE-13635:
---------------------------------------

This cluster is a write dominated cluster so rpc queues are set up as follows:
'hbase.ipc.server.callqueue.handler.factor': 0.7
'hbase.ipc.server.callqueue.read.ratio': 0.3
'hbase.ipc.server.callqueue.scan.ratio': 0.2

Looks like the scheduler is assuming that any requests that aren't mutate are read requests,
so all of the requests are going to the very small set of read handlers.

Read handlers are all stuck waiting on mutating meta.

The call queue is full so anything going in will block. Hence the master being considered
dead.

> Regions stuck in transition because master is incorrectly assumed dead
> ----------------------------------------------------------------------
>
>                 Key: HBASE-13635
>                 URL: https://issues.apache.org/jira/browse/HBASE-13635
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 1.0.0
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>
> On master I see:
> {code}
> 15/05/05 20:56:38 INFO master.HMaster: balance hri=hbase:meta,,1.1588230740, src=hbase1375.prn2.facebook.com,16020,1430858968368,
dest=hbase1377.prn2.facebook.com,16020,1430884264554
> 15/05/05 20:56:38 INFO master.RegionStates: Transition {1588230740 state=OPEN, ts=1430876450098,
server=hbase1375.prn2.facebook.com,16020,1430858968368} to {1588230740 state=PENDING_CLOSE,
ts=1430884598277, server=hbase1375.prn2.facebook.com,16020,1430858968368}
> Tue May 05 21:01:54 PDT 2015, null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60724: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase1375.prn2.facebook.com,16020,1430858968368,
seqNum=0
> Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60724: row
'' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase1375.prn2.facebook.com,16020,1430858968368,
seqNum=0
> {code}
> On the regionserver I see the following log spew:
> {code}
> 15/05/06 09:30:11 INFO regionserver.HRegionServer: Failed to report region transition,
will retry
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers
list: hbasectrl054.prn2.facebook.com/10.104.157.28:16020
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:694)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:880)
> 	at or^Cg.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:849)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1173)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
> 	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionStateTransition(RegionServerStatusProtos.java:8325)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionStateTransition(HRegionServer.java:1863)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionStateTransition(HRegionServer.java:1837)
> 	at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:157)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message