hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14987) Compaction marker whose region name doesn't match current region's needs to be handled
Date Tue, 22 Dec 2015 22:25:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068823#comment-15068823

Enis Soztutar commented on HBASE-14987:

Thanks Ted, Stephen on working on this. 

To summarize the issue and some of the discussions so far if you have not been following closely.
The root cause of the issue is that if HBCK decides to fix an overlap, it will create a new
region and move all the files and folders including the {{recovered.edits}} into the new region
from the old overlapping regions within the range. Moving the data files is fine, however,
when recovered.edits is moved to the new region, replaying of the compaction markers throw
WrongRegionException. The other edits are already skipped (in 1.0+) if region names do not
match in log split.

The replay compaction marker is used by recored.edits through regular log split, though distributed
log replay or region replica replication for secondary regions (where they replay the compaction
from primary).  In the log split case, we want to skip the edits (due to HBCK case), but secondary
region replication we still want to throw the exception if regions do not match. 

Now, coming to the patch, instead of this: 
+                replayWALCompactionMarker(compaction, false, true, Long.MAX_VALUE,
+                  !checkRowWithinBoundary);
can we do this: 
+                if (checkRowWithinBoundary) {
+                  replayWALCompactionMarker(compaction, false, true, Long.MAX_VALUE);
+                }
Sending a boolean to replayWALCompactionMarker() which will fail everytime should be avoided.
We should simply not call the method if that is the case. 

The new test case uses region replica replication via secondary regions, however, ideally
we would like to test the compaction replay through recovered.edits which is not related to
secondary replicas. 

> Compaction marker whose region name doesn't match current region's needs to be handled
> --------------------------------------------------------------------------------------
>                 Key: HBASE-14987
>                 URL: https://issues.apache.org/jira/browse/HBASE-14987
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Stephen Yuan Jiang
>         Attachments: 14987-suggest.txt, 14987-v1.txt, 14987-v2.txt, 14987-v2.txt
> One customer encountered the following error when replaying recovered edits, leading
to region open failure:
> {code}
> region=table1,d6b-2282-9223370590058224807-U-9856557-        EJ452727-16313786400171,1449616291799.fa8a526f2578eb3630bb08a4b1648f5d.,
starting to roll back the global memstore   size.
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Compaction marker from WAL
table_name: "table1"
> encoded_region_name: "d389c70fde9ec07971d0cfd20ef8f575"
> ...
> region_name: "table1,d6b-2282-9223370590058224807-U-9856557-EJ452727-16313786400171,1449089609367.d389c70fde9ec07971d0cfd20ef8f575."
>  targetted for region d389c70fde9ec07971d0cfd20ef8f575 does not match this region: {ENCODED
=> fa8a526f2578eb3630bb08a4b1648f5d, NAME => 'table1,d6b-2282-                     
STARTKEY => 'd6b-2282-9223370590058224807-U-9856557-EJ452727-             16313786400171',
ENDKEY => 'd76-2553-9223370588576178807-U-7416904-EK875822-17662180600000'}
>   at org.apache.hadoop.hbase.regionserver.HRegion.checkTargetRegion(HRegion.java:4592)
>   at org.apache.hadoop.hbase.regionserver.HRegion.replayWALCompactionMarker(HRegion.java:3831)
>   at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3747)
>   at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3601)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:911)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:789)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:762)
>   at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5774)
>   at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5744)
> {code}
> This was likely caused by the following action of hbck:
> {code}
> 15/12/08 18:11:34 INFO util.HBaseFsck: [hbasefsck-pool1-t37] Moving files from hdfs://Zealand/hbase/data/default/table1/d389c70fde9ec07971d0cfd20ef8f575/recovered.edits
into     containing region hdfs://Zealand/hbase/data/default/table1/fa8a526f2578eb3630bb08a4b1648f5d/recovered.edits
> {code}
> The recovered.edits for d389c70fde9ec07971d0cfd20ef8f575 contained compaction marker
which couldn't be replayed against fa8a526f2578eb3630bb08a4b1648f5d

This message was sent by Atlassian JIRA

View raw message