hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13592) RegionServer sometimes gets stuck during shutdown in case of cache flush failures
Date Wed, 29 Apr 2015 21:49:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520337#comment-14520337
] 

Hudson commented on HBASE-13592:
--------------------------------

FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #921 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/921/])
HBASE-13592 RegionServer sometimes gets stuck during shutdown in case of cache flush failures.
(Vikas Vishwakarma) (larsh: rev f6b418544e1174e960a188d6ac3eb0c0c2678af3)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> RegionServer sometimes gets stuck during shutdown in case of cache flush failures
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-13592
>                 URL: https://issues.apache.org/jira/browse/HBASE-13592
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.10
>            Reporter: Vikas Vishwakarma
>            Assignee: Vikas Vishwakarma
>             Fix For: 0.98.13
>
>         Attachments: HBASE-13592-0.98.patch
>
>
> Observed that RegionServer sometimes gets stuck during shutdown in case of cache flush
failures. On adding few debug logs and looking through the stack trace RegionServer process
looks stuck in closeWAL -> hlog.close -> closeBarrier.stopAndDrainOps(); during the
shutdown sequence in the run method
> From the RegionServer logs we see there are multiple attempts to flush cache for a particular
region which increments the beginOp count in DrainBarrier but all the flush attempts fails
somewhere in wal sync and the DrainBarrier endOp count decrement never happens. Later on when
shutdown is initiated RegionServer process is permanently stuck here
> In this case hbase stop also does not work and RegionServer process has to be explicitly
killed using kill -9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message