hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15940) HBCK unnecessary moves reference files when a table has split region to fix non-existing overlap regions
Date Fri, 03 Jun 2016 23:08:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315073#comment-15315073
] 

Stephen Yuan Jiang commented on HBASE-15940:
--------------------------------------------

The existing flow is: offline region, move some of the files in the overlapped regions to
a new region, then sidelined the old region directory.  So any files (eg. the old .regioninfo
file) that are not moved will be sidelined.  I don't have to write any new code for this.

Yeah, I did see HBASE-15406 when I add the new admin.setCatalogJanitor(false) code.

> HBCK unnecessary moves reference files when a table has split region to fix non-existing
overlap regions
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15940
>                 URL: https://issues.apache.org/jira/browse/HBASE-15940
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>    Affects Versions: 1.0.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>         Attachments: org.apache.hadoop.hbase.util.TestHBaseFsck-output.txt, repro-hbck-repair-healthy-splitted=region.patch,
skipReferenceFiles.patch
>
>
> When repair option (the -fixHdfsOverlaps option specifically) is specified against a
table, if the table has splitted regions (both parent region and child regions exists with
reference files), Hbck would wrongly think that there exists overlapped regions and try to
merge them and fix it.  
> This is by-design, as current implementation of Hbck uses HDFS as the trusted source
without consulting META table.
> Here is the comments from one of unit tests:
> {code}
>       // TODO: fixHdfsHoles does not work against splits, since the parent dir lingers
on
>       // for some time until children references are deleted. HBCK erroneously sees this
as
>       // overlapping regions
> {code}
> However, this is undesirable.  when the reference files moved to a new region, the parent
region would have no daugher regions and hence it could be cleaned up by CatalogJanitor. 
This would create real inconsistency: lingering reference files.  
> Another bad consequence is that we would merge splitted regions back to one.  Even it
is undesirable, at least this would not cause more inconsistency.  this JIRA would not try
to solve this unsplit issue, as it requires bigger design change in Hbck.  
> This JIRA is  trying to address the potential lingering reference files issue, as multiple
customers using branch-1 faced this issue in production.  (workaround is that run major compaction
on all split regions before run HBCK, this could take longer time and have production impact).
> Attached is the log and modified unit test to repro the issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message