Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5245910856 for ; Wed, 26 Nov 2014 22:09:13 +0000 (UTC) Received: (qmail 22739 invoked by uid 500); 26 Nov 2014 22:09:12 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 22664 invoked by uid 500); 26 Nov 2014 22:09:12 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 22651 invoked by uid 500); 26 Nov 2014 22:09:12 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 22648 invoked by uid 99); 26 Nov 2014 22:09:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Nov 2014 22:09:12 +0000 Date: Wed, 26 Nov 2014 22:09:12 +0000 (UTC) From: "Jihong Liu (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872 ] Jihong Liu commented on HIVE-8966: ---------------------------------- Yes. Closed the transaction batch. Suggest to do either the following two updates, or do both: 1. if a file is non-bucket file, don't try to compact it. So update the following code: in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java Change the following code: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); } ..... to: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); return; } .... 2. don't use the bucket file pattern to name to "flush_length" file. So update the following code: in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java change the following code: static Path getSideFile(org.apache.tools.ant.types.Path main) { return new Path(main + "_flush_length"); } to: static Path getSideFile(org.apache.tools.ant.types.Path main) { if (main.toString().startsWith("bucket_")) { return new Path("bkt"+main.toString().substring(6)+ "_flush_length"); } else return new Path(main + "_flush_length"); } after did the above updates and re-compiled the hive-exec.jar, the compaction works fine now > Delta files created by hive hcatalog streaming cannot be compacted > ------------------------------------------------------------------ > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.14.0 > Environment: hive > Reporter: Jihong Liu > Assignee: Alan Gates > Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where "n" is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter table partition compact" finished successfully. If don't delete that file, nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)