Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA06318B0A for ; Thu, 17 Dec 2015 02:41:47 +0000 (UTC) Received: (qmail 27567 invoked by uid 500); 17 Dec 2015 02:41:47 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 27458 invoked by uid 500); 17 Dec 2015 02:41:47 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 27249 invoked by uid 99); 17 Dec 2015 02:41:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2015 02:41:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D58E72C1F72 for ; Thu, 17 Dec 2015 02:41:46 +0000 (UTC) Date: Thu, 17 Dec 2015 02:41:46 +0000 (UTC) From: "Eugene Koifman (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12352: ---------------------------------- Description: Worker will start with DB in state X (wrt this partition). while it's working more txns will happen, against partition it's compacting. then this will delete state up to X and since then. There may be new delta files created between compaction starting and cleaning. These will not be compacted until more transactions happen. So this ideally should only delete up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also run at READ_COMMITTED. So this means we'd want to store HWM in COMPACTION_QUEUE when Worker picks up the job. Actually the problem is even worse (but also solved using HWM as above): Suppose some transactions (against same partition) have started and aborted since the time Worker ran compaction job. That means there are never-compacted delta files with data that belongs to these aborted txns. Following will pick up these aborted txns. s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" + info.tableName + "'"; if (info.partName != null) s += " and tc_partition = '" + info.partName + "'"; The logic after that will delete relevant data from TXN_COMPONENTS and if one of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). At that point any metadata about an Aborted txn is gone and the system will think it's committed. HWM in this case would be (in ValidCompactorTxnList) if(minOpenTxn > 0) min(highWaterMark, minOpenTxn) else highWaterMark was: Worker will start with DB in state X (wrt this partition). while it's working more txns will happen, against partition it's compacting. then this will delete state up to X and since then. There may be new delta files created between compaction starting and cleaning. These will not be compacted until more transactions happen. So this ideally should only delete up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also run at READ_COMMITTED. So this means we'd want to store HWM in COMPACTION_QUEUE when Worker picks up the job. Actually the problem is even worse (but also solved using HWM as above): Suppose some transactions (against same partition) have started and aborted since the time Worker ran compaction job. That means there are never-compacted delta files with data that belongs to these aborted txns. Following will pick up these aborted txns. s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" + info.tableName + "'"; if (info.partName != null) s += " and tc_partition = '" + info.partName + "'"; The logic after that will delete relevant data from TXN_COMPONENTS and if one of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). At that point any metadata about an Aborted txn is gone and the system will think it's committed. > CompactionTxnHandler.markCleaned() may delete too much > ------------------------------------------------------ > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Blocker > > Worker will start with DB in state X (wrt this partition). > while it's working more txns will happen, against partition it's compacting. > then this will delete state up to X and since then. There may be new delta files created > between compaction starting and cleaning. These will not be compacted until more > transactions happen. So this ideally should only delete > up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also run > at READ_COMMITTED. So this means we'd want to store HWM in COMPACTION_QUEUE when > Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). At that point any metadata about an Aborted txn is gone and the system will think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)