Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CA455200D5A for ; Thu, 14 Dec 2017 10:53:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C8A96160C04; Thu, 14 Dec 2017 09:53:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 19BF8160C01 for ; Thu, 14 Dec 2017 10:53:08 +0100 (CET) Received: (qmail 42103 invoked by uid 500); 14 Dec 2017 09:53:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 42092 invoked by uid 99); 14 Dec 2017 09:53:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Dec 2017 09:53:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AADE3C58B0 for ; Thu, 14 Dec 2017 09:53:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mPQ3SMSZ-0Li for ; Thu, 14 Dec 2017 09:53:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A757A5F474 for ; Thu, 14 Dec 2017 09:53:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1BB34E0662 for ; Thu, 14 Dec 2017 09:53:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D2A63212FA for ; Thu, 14 Dec 2017 09:53:05 +0000 (UTC) Date: Thu, 14 Dec 2017 09:53:05 +0000 (UTC) From: "Appy (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19457) Debugging flaky TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 14 Dec 2017 09:53:10 -0000 [ https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290632#comment-16290632 ] Appy commented on HBASE-19457: ------------------------------ After more debugging, i think i finally have fix (sorry for being slow, just beginning to understand AM). So the issue is, We delete table's state from meta (in step [TRUNCATE_TABLE_REMOVE_FROM_META |https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java#L102]) On recovery, TableStateManager#fixTableStates assumes that missing state means enabled table is enabled. ([here|https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L218]) Later we add regions to meta and crash after that. On recovery, AM sees these regions, looks for table state and finds it enabled, and starts assigning them and screws up. Simple fix here would be: Don't delete table state from meta, just let it remain DISABLED. ------- But CreateTableProcedure also adds regions to meta and crashes. Why don't we see same issue there? It adds region row to meta, but does not add any row for the table. On recovery, when AM looks for table state corresponding to those regions, TSM#getTableState() throws TableNotFoundException, which get's caught [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L135]..etc etc End result being, it ignores those regions. ---- Some bigger questions to ponder: 1) Should we really assume missing state column as enabled? Probably assuming disabled is more conservative and better choice? Won't screws up the cluster. (Only other place delete the state column is hbck) 2) Shouldn't new regions always be added with state closed? (dev thread: http://mail-archives.apache.org/mod_mbox/hbase-dev/201712.mbox/browser) > Debugging flaky TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits > --------------------------------------------------------------------------------------- > > Key: HBASE-19457 > URL: https://issues.apache.org/jira/browse/HBASE-19457 > Project: HBase > Issue Type: Bug > Reporter: Appy > Assignee: Appy > Attachments: patch1, test-output.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)