From dev-return-48691-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Tue Jan 23 00:47:09 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 6B773180609 for ; Tue, 23 Jan 2018 00:47:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5B6AE160C4C; Mon, 22 Jan 2018 23:47:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A17C9160C4B for ; Tue, 23 Jan 2018 00:47:08 +0100 (CET) Received: (qmail 67363 invoked by uid 500); 22 Jan 2018 23:47:07 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 67352 invoked by uid 99); 22 Jan 2018 23:47:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jan 2018 23:47:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2E8731805D6 for ; Mon, 22 Jan 2018 23:47:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -107.911 X-Spam-Level: X-Spam-Status: No, score=-107.911 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id QdxxXSjoXId6 for ; Mon, 22 Jan 2018 23:47:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id C0EE65FB5F for ; Mon, 22 Jan 2018 23:47:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A7A7E0FAA for ; Mon, 22 Jan 2018 23:47:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A1B85241CE for ; Mon, 22 Jan 2018 23:47:00 +0000 (UTC) Date: Mon, 22 Jan 2018 23:47:00 +0000 (UTC) From: "James Taylor (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-4537) RegionServer initiating compaction can trigger schema migration and deadlock the system MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335178#comment-16335178 ] James Taylor commented on PHOENIX-4537: --------------------------------------- The upgrade *should not* happen from a server-side connection. We have code in place for this already. See QueryUtil.getConnectionOnServer(): {code} public static Connection getConnectionOnServer(Properties props, Configuration conf) throws ClassNotFoundException, SQLException { UpgradeUtil.doNotUpgradeOnFirstConnection(props); return getConnection(props, conf); } {code} Sounds like we need a bit more discussion on this one before committing given the comfort level, [~elserj]. What about the idea I outlined above of doing PHOENIX-4530 and removing the clearTsOnDisabledIndexes altogether? > RegionServer initiating compaction can trigger schema migration and deadlock the system > --------------------------------------------------------------------------------------- > > Key: PHOENIX-4537 > URL: https://issues.apache.org/jira/browse/PHOENIX-4537 > Project: Phoenix > Issue Type: Bug > Reporter: Romil Choksi > Assignee: Josh Elser > Priority: Critical > Fix For: 5.0.0, 4.14.0 > > Attachments: PHOENIX-4537.001.patch > > > [~sergey.soldatov] has been doing some great digging around a test failure we've been seeing at $dayjob. The situation goes like this. > 0. Run some arbitrary load > 1. Stop HBase > 2. Enable schema mapping ({{phoenix.schema.isNamespaceMappingEnabled=true}} and {{phoenix.schema.mapSystemTablesToNamespace=true}} in hbase-site.xml) > 3. Start HBase > 4. Circumstantially, have the SYSTEM.CATALOG table need a compaction to run before a client first connects > When the RegionServer initiates the compaction, it will end up running {{UngroupedAggregateRegionObserver.clearTsOnDisabledIndexes}} which opens a Phoenix connection. While the RegionServer won't upgrade system tables, it *will* try to migrate them into the schema mapped variants (e.g. SYSTEM.CATALOG to SYSTEM:CATALOG). > However, one of the first steps in the schema migration is to disable the SYSTEM.CATALOG table. However, the SYSTEM.CATALOG table can't be disabled until the region is CLOSED, and the region cannot be CLOSED until the compaction is finished. *deadlock* > The "obvious" fix is to avoid RegionServers from triggering system table migrations, but Sergey and [~elserj] both think that this will end badly (RegionServers falling over because they expect the tables to be migrated and they aren't). > Thoughts? [~ankit.singhal], [~jamestaylor], any others? -- This message was sent by Atlassian JIRA (v7.6.3#76005)