hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
Date Wed, 12 Nov 2014 04:56:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207682#comment-14207682

Hudson commented on HBASE-12319:

FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #637 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/637/])
HBASE-12319: Inconsistencies during region recovery due to close/open of a region during recovery
(jeffreyz: rev 78c1a919c906649427defdcaa17d8ab73bbb9482)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

> Inconsistencies during region recovery due to close/open of a region during recovery
> ------------------------------------------------------------------------------------
>                 Key: HBASE-12319
>                 URL: https://issues.apache.org/jira/browse/HBASE-12319
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.7, 0.99.1
>            Reporter: Devaraj Das
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.98.8, 0.99.2
>         Attachments: HBASE-12319-v2.patch, HBASE-12319.patch
> In one of my test runs, I saw the following:
> {noformat}
> 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore:
loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04,
isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
> 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion:
Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
> .............
> .............
> 2014-10-14 13:45:31,916 WARN  [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion:
Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080
> {noformat}
> The above logs is from a regionserver, say RS2. From the initial analysis it seemed like
the master asked a certain regionserver to open the region (let's say RS1) and for some reason
asked it to close soon after. The open was still proceeding on RS1 but the master reassigned
the region to RS2. This also started the recovery but it ended up seeing an inconsistent view
of the recovered-edits files (it reports missing files as per the logs above) since the first
regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens
the region, it might not see the recent data that was written by flushes on hor9n10 during
the recovery process. Reads of that data would have inconsistencies.

This message was sent by Atlassian JIRA

View raw message