Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 117892009F8 for ; Fri, 3 Jun 2016 16:13:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 107C7160A11; Fri, 3 Jun 2016 14:13:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 58853160A48 for ; Fri, 3 Jun 2016 16:13:00 +0200 (CEST) Received: (qmail 93798 invoked by uid 500); 3 Jun 2016 14:12:59 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 93559 invoked by uid 99); 3 Jun 2016 14:12:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2016 14:12:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 33FE92C1F62 for ; Fri, 3 Jun 2016 14:12:59 +0000 (UTC) Date: Fri, 3 Jun 2016 14:12:59 +0000 (UTC) From: "Stephen Yuan Jiang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-15940) HBCK unnecessary moves reference files when a table has split region to fix non-existing overlap regions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 03 Jun 2016 14:13:01 -0000 [ https://issues.apache.org/jira/browse/HBASE-15940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314161#comment-15314161 ] Stephen Yuan Jiang commented on HBASE-15940: -------------------------------------------- After HBCK identifies the overlapped regions, it first offline those regions, then moves files in those regions into a new region to repair. So the real root cause is that catalog janitor is running during repair. Another possible solution (simpler and safer) is just to disable catalog janitor during HBCK. I still believe that we should not copy reference files around (the files it referenced would be copied to the same location so reference files are unneeded; or the reference files are orphaned and it should be removed any way - in both cases, reference files are not useful in a new region). Attached a patch to have both change (UT testing indicates no problem if we skipped reference file, but it is risky and needs more testing). > HBCK unnecessary moves reference files when a table has split region to fix non-existing overlap regions > -------------------------------------------------------------------------------------------------------- > > Key: HBASE-15940 > URL: https://issues.apache.org/jira/browse/HBASE-15940 > Project: HBase > Issue Type: Bug > Components: hbck > Affects Versions: 1.0.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Attachments: org.apache.hadoop.hbase.util.TestHBaseFsck-output.txt, repro-hbck-repair-healthy-splitted=region.patch, skipReferenceFiles.patch > > > When repair option (the -fixHdfsOverlaps option specifically) is specified against a table, if the table has splitted regions (both parent region and child regions exists with reference files), Hbck would wrongly think that there exists overlapped regions and try to merge them and fix it. > This is by-design, as current implementation of Hbck uses HDFS as the trusted source without consulting META table. > Here is the comments from one of unit tests: > {code} > // TODO: fixHdfsHoles does not work against splits, since the parent dir lingers on > // for some time until children references are deleted. HBCK erroneously sees this as > // overlapping regions > {code} > However, this is undesirable. when the reference files moved to a new region, the parent region would have no daugher regions and hence it could be cleaned up by CatalogJanitor. This would create real inconsistency: lingering reference files. > Another bad consequence is that we would merge splitted regions back to one. Even it is undesirable, at least this would not cause more inconsistency. this JIRA would not try to solve this unsplit issue, as it requires bigger design change in Hbck. > This JIRA is trying to address the potential lingering reference files issue, as multiple customers using branch-1 faced this issue in production. (workaround is that run major compaction on all split regions before run HBCK, this could take longer time and have production impact). > Attached is the log and modified unit test to repro the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)