hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13418) Regions getting stuck in PENDING_CLOSE state infinitely in high load HA scenarios
Date Wed, 22 Apr 2015 20:09:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507791#comment-14507791
] 

Andrew Purtell commented on HBASE-13418:
----------------------------------------

bq. I think I saw something like that long time ago if using Hadoop 2.0 and SCRs.
Maybe. We are using SCR. Our Hadoop is based on 2.3.0 (CDH 5.0.1 - yes, I know, we are upgrading
soon - plus the patch on HDFS-6440 ported to that version)

> Regions getting stuck in PENDING_CLOSE state infinitely in high load HA scenarios
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-13418
>                 URL: https://issues.apache.org/jira/browse/HBASE-13418
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.10
>            Reporter: Vikas Vishwakarma
>
> In some heavy data load cases when there are multiple RegionServers going up/down (HA)
or when we try to shutdown/restart the entire HBase cluster, we are observing that some regions
are getting stuck in PENDING_CLOSE state infinitely. 
> On going through the logs for a particular region stuck in PENDING_CLOSE state, it looks
like for this region two memstore flush got triggered within few milliseconds as given below
and after sometime there is Unrecoverable exception while closing region. I am suspecting
this could be some kind of race condition but need to check further
> Logs:
> ================
> ......
> 2015-04-06 11:47:33,309 INFO  [2,queue=0,port=60020] regionserver.HRegionServer - Close
884fd5819112370d9a9834895b0ec19c, via zk=yes, znode version=0, on blitzhbase01-dnds1-4-crd.eng.sfdc.net,60020,1428318111711
> 2015-04-06 11:47:33,309 DEBUG [-dnds3-4-crd:60020-0] handler.CloseRegionHandler - Processing
close of RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.
> 2015-04-06 11:47:33,319 DEBUG [-dnds3-4-crd:60020-0] regionserver.HRegion - Closing RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.:
disabling compactions & flushes
> 2015-04-06 11:47:33,319 INFO  [-dnds3-4-crd:60020-0] regionserver.HRegion - Running close
preflush of RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.
> 2015-04-06 11:47:33,319 INFO  [-dnds3-4-crd:60020-0] regionserver.HRegion - Started memstore
flush for RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.,
current region memstore size 70.0 M
> 2015-04-06 11:47:33,327 DEBUG [-dnds3-4-crd:60020-0] regionserver.HRegion - Updates disabled
for region RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.
> 2015-04-06 11:47:33,328 INFO  [-dnds3-4-crd:60020-0] regionserver.HRegion - Started memstore
flush for RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.,
current region memstore size 70.0 M
> 2015-04-06 11:47:33,328 WARN  [-dnds3-4-crd:60020-0] wal.FSHLog - Couldn't find oldest
seqNum for the region we are about to flush: [884fd5819112370d9a9834895b0ec19c]
> 2015-04-06 11:47:33,328 WARN  [-dnds3-4-crd:60020-0] regionserver.MemStore - Snapshot
called again without clearing previous. Doing nothing. Another ongoing flush or did we fail
last attempt?
> 2015-04-06 11:47:33,334 FATAL [-dnds3-4-crd:60020-0] regionserver.HRegionServer - ABORTING
region server blitzhbase01-dnds3-4-crd.eng.sfdc.net,60020,1428318082860: Unrecoverable exception
while closing region RMHA_1,\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1428318937003.884fd5819112370d9a9834895b0ec19c.,
still finishing close



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message