Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 26 Nov 2014 22:09:12 +0000 (UTC)
From: "Jihong Liu (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12757795.1416947573000.30281.1417039752573@Atlassian.JIRA>
In-Reply-To: <JIRA.12757795.1416947573000@Atlassian.JIRA>
References: <JIRA.12757795.1416947573000@Atlassian.JIRA>
 <JIRA.12757795.1416947573814@arcas>
Subject: [jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog
 streaming cannot be compacted
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872 ] 

Jihong Liu commented on HIVE-8966:
----------------------------------

Yes. Closed the transaction batch. Suggest to do either the following two updates, or do both:

1. if a file is non-bucket file, don't try to compact it. So update the following code:
   in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java
  Change the following code:

  private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
                              Map<Integer, BucketTracker> splitToBucketMap) {
      if (!matcher.find()) {
        LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " +
            file.toString());
      }

   .....
 to:
   private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
                              Map<Integer, BucketTracker> splitToBucketMap) {
      if (!matcher.find()) {
        LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " +
            file.toString());
        return;
      }
     ....

2. don't use the bucket file pattern to name to "flush_length" file. So update the following code:
  in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java
 change the following code:
   static Path getSideFile(org.apache.tools.ant.types.Path main) {
     return new Path(main + "_flush_length");
   }

to:
 static Path getSideFile(org.apache.tools.ant.types.Path main) {
	if (main.toString().startsWith("bucket_")) {
	     return new Path("bkt"+main.toString().substring(6)+ "_flush_length");
	}
              else return new Path(main + "_flush_length");
  }
 
after did the above updates and re-compiled the hive-exec.jar, the compaction works fine now


> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where "n" is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter table partition compact" finished successfully. If don't delete that file, nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)