Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A5AE9200C24 for ; Thu, 23 Feb 2017 15:11:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A4551160B78; Thu, 23 Feb 2017 14:11:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E8BB8160B62 for ; Thu, 23 Feb 2017 15:11:48 +0100 (CET) Received: (qmail 15525 invoked by uid 500); 23 Feb 2017 14:11:48 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 15514 invoked by uid 99); 23 Feb 2017 14:11:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2017 14:11:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7AA381A01EF for ; Thu, 23 Feb 2017 14:11:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Cz_RcFiV5DR1 for ; Thu, 23 Feb 2017 14:11:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 39A38618A4 for ; Thu, 23 Feb 2017 14:11:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 526C5E093B for ; Thu, 23 Feb 2017 14:11:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8EF9424146 for ; Thu, 23 Feb 2017 14:11:44 +0000 (UTC) Date: Thu, 23 Feb 2017 14:11:44 +0000 (UTC) From: "Abhishek Singh Chouhan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-17682) Region stuck in merging_new state indefinitely MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 23 Feb 2017 14:11:49 -0000 [ https://issues.apache.org/jira/browse/HBASE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-17682: ------------------------------------------- Attachment: HBASE-17682.master.001.patch > Region stuck in merging_new state indefinitely > ---------------------------------------------- > > Key: HBASE-17682 > URL: https://issues.apache.org/jira/browse/HBASE-17682 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.0 > Reporter: Abhishek Singh Chouhan > Assignee: Abhishek Singh Chouhan > Attachments: HBASE-17682.branch-1.3.001.patch, HBASE-17682.master.001.patch > > > Ran into issue while tinkering around with a chaos monkey that did splits, merges and kills exclusively, which resulted in regions getting stuck in transition in merging new state indefinitely which i think happens when the rs is killed during the merge but before the ponr, in which case the new regions state in master is merging new. When the rs dies at this point the master executes RegionStates.serverOffline() for the rs which does > for (RegionState state : regionsInTransition.values()) { > HRegionInfo hri = state.getRegion(); > if (assignedRegions.contains(hri)) { > // Region is open on this region server, but in transition. > // This region must be moving away from this server, or splitting/merging. > // SSH will handle it, either skip assigning, or re-assign. > LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn); > } else if (sn.equals(state.getServerName())) { > // Region is in transition on this region server, and this > // region is not open on this server. So the region must be > // moving to this server from another one (i.e. opening or > // pending open on this server, was open on another one. > // Offline state is also kind of pending open if the region is in > // transition. The region could be in failed_close state too if we have > // tried several times to open it while this region server is not reachable) > if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) { > LOG.info("Found region in " + state + > " to be reassigned by ServerCrashProcedure for " + sn); > rits.add(hri); > } else if(state.isSplittingNew()) { > regionsToCleanIfNoMetaEntry.add(state.getRegion()); > } else { > LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state); > } > } > } > We donot handle merging new here and end up with "THIS SHOULD NOT HAPPEN: unexpected ...". Post this we have the new region which does not have any data stuck which leads to the balancer not running. > I think we should handle mergingnew the same way as splittingnew. -- This message was sent by Atlassian JIRA (v6.3.15#6346)