hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location
Date Tue, 19 May 2015 01:36:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549628#comment-14549628

Enis Soztutar commented on HBASE-11536:

bq. I agree with your that relying on timestamps, while it would work 9.99999999999999999999%
of the time, because of the murphy's law, the one time it fails, it'd be a high profile situation
and we'd all have to go home (smile).
Murphy loves me. We can reproduce a failure because of this on EC2 clusters a lot more frequently
than what you quote. This happens more for region replicas, because the master writes null
entries for server columns for replicas, and then assigns those. When the replicas got assigned,
if the RS's timestamp is lagging behind the master by more than the amount of time the assignment
took place (a couple of seconds), then the server location becomes null for the replica. 

Anyway opened HBASE-13709 detailing the issue. Please take a look. 

> Puts of region location to Meta may be out of order which causes inconsistent of region
> ------------------------------------------------------------------------------------------------
>                 Key: HBASE-11536
>                 URL: https://issues.apache.org/jira/browse/HBASE-11536
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Critical
>             Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>         Attachments:,, 11536-trunk.txt, HBASE-11536-0.94-v1.diff
> In product hbase cluster, we found inconsistency of region location in the meta table.
Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in regionserver but
the region location in Meta table is
> This is because of the out-of-order puts for meta table.
> # HMaster try to assign the region to
> # RegionServer: During the opening the region, the put of region
location( to meta table is timeout(60s) and the htable retry for second
time. (regionserver serving meta has got the request of the put. The timeout is beause  ther
is a bad disk in this regionserver and sync of hlog is very slow. 
> )
> During the retry in htable, the OpenRegionHandler is timeout(100s) and the PostOpenDeployTasksThread
is interrupted. Through the htable is closed in the MetaEditor finally, the share connection
the htable used is not closed and the call of put for meta table is on-flying in the connection.
Assumed that this on-flying call of put to meta is  named call A.
> # RegionServer: For the timeout of OpenRegionHandler, the OpenRegionHandler
marks the assign state of this region to FAILED_OPEN.
> # HMaster watchs this event of FAILED_OPEN and assigns the region to another regionserver:
> # RegionServer: This regionserver opens the region successfully .
Assumed that the put of region location( to meta table in this regionserver
is named B.
> There is no order guarantee for call A and B. If call A is processed after call B in
regionserver serving meta region, the region location in meta table will be wrong.
> From the raw scan of meta table we found:
> {code}
> scan '.META.', {RAW => true, LIMIT => 1, VERSIONS => 10, STARTROW => 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'}

> {code}
> {quote}
> xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885460553(=>
Wed Jul 09 13:57:40 +0800 2014), value= --> Retry put from
> xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885456731(=>
Wed Jul 09 13:57:36 +0800 2014), value= --> put from
> xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885353122( Wed
Jul 09 13:55:53 +0800 2014), value=  --> First put from
> {quote}
> Related hbase log is attached in this issue and disscusions are welcomed.
> For there is no order guarantee for puts from different htables, one solution for this
issue is to give an increased id for each assignment of a region and use this id as the timestamp
of put of region location to meta table. The region location with large assign id will be
got by hbase clients.

This message was sent by Atlassian JIRA

View raw message