Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 30B21CD8B for ; Tue, 17 Jul 2012 10:35:42 +0000 (UTC) Received: (qmail 55704 invoked by uid 500); 17 Jul 2012 10:35:39 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 54836 invoked by uid 500); 17 Jul 2012 10:35:37 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 54583 invoked by uid 500); 17 Jul 2012 10:35:36 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 54543 invoked by uid 99); 17 Jul 2012 10:35:35 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2012 10:35:35 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id BF19A14285A for ; Tue, 17 Jul 2012 10:35:34 +0000 (UTC) Date: Tue, 17 Jul 2012 10:35:34 +0000 (UTC) From: "Navis (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <1903343883.63171.1342521334786.JavaMail.jiratomcat@issues-vm> In-Reply-To: <2113057469.74267.1341016544133.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416071#comment-13416071 ] Navis commented on HIVE-3218: ----------------------------- For quries handling many partitions with many buckets, it would be possibly needed to use option 3 parallelly. I'm thinking it for another issue. > Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly > ----------------------------------------------------------------------------------------- > > Key: HIVE-3218 > URL: https://issues.apache.org/jira/browse/HIVE-3218 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Navis > Assignee: Navis > Priority: Critical > Attachments: HIVE-3218.1.patch.txt > > > {noformat} > drop table hive_test_smb_bucket1; > drop table hive_test_smb_bucket2; > create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets; > create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets; > set hive.enforce.bucketing = true; > set hive.enforce.sorting = true; > insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src; > insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src; > insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src; > set hive.optimize.bucketmapjoin = true; > set hive.optimize.bucketmapjoin.sortedmerge = true; > set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key; > {noformat} > which make bucket join context.. > {noformat} > Alias Bucket Output File Name Mapping: > hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0 > hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1 > hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0 > hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1 > {noformat} > fails with exception > {noformat} > java.lang.RuntimeException: Hive Runtime Error while closing operators > at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0 > at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) > ... 8 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira