Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35E55184D5 for ; Mon, 17 Aug 2015 23:59:51 +0000 (UTC) Received: (qmail 34961 invoked by uid 500); 17 Aug 2015 23:59:45 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 34886 invoked by uid 500); 17 Aug 2015 23:59:45 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 34814 invoked by uid 99); 17 Aug 2015 23:59:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2015 23:59:45 +0000 Date: Mon, 17 Aug 2015 23:59:45 +0000 (UTC) From: "Shannon Ladymon (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700477#comment-14700477 ] Shannon Ladymon commented on HIVE-11317: ---------------------------------------- Doc note: I have added/updated documentation for *hive.timedout.txn.reaper.start* and *hive.timedout.txn.reaper.interval* to the following pages in the wiki: * [Hive Transactions - Configuration | https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration] * [Configuration Properties - Transactions and Compactor | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor] If it looks okay, we can remove the TODOC1.3 label. > ACID: Improve transaction Abort logic due to timeout > ---------------------------------------------------- > > Key: HIVE-11317 > URL: https://issues.apache.org/jira/browse/HIVE-11317 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Labels: TODOC1.3, triage > Fix For: 1.3.0 > > Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch > > > the logic to Abort transactions that have stopped heartbeating is in > TxnHandler.timeOutTxns() > This is only called when DbTxnManger.getValidTxns() is called. > So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. > Also, streaming api doesn't call DbTxnManager at all. > Need to move this logic into Initiator (or some other metastore side thread). > Also, make sure it is broken up into multiple small(er) transactions against metastore DB. > Also more timeOutLocks() locks there as well. > see about adding TXNS.COMMENT field which can be used for "Auto aborted due to timeout" for example. > The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)