hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenxiao Luo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns
Date Sun, 30 Sep 2012 05:45:07 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466422#comment-13466422
] 

Zhenxiao Luo commented on HIVE-3467:
------------------------------------

Currently, BucketMapJoinOptimizer does not keep Partition information in its aliasToPartitionBucketNumberMapping
and aliasToPartitionBucketFileNamesMapping, without information of Partition Columns, could
not do the partition aware optimization. How about adding Partition info into the map:


-      LinkedHashMap<String, List<Integer>> aliasToPartitionBucketNumberMapping
=
-          new LinkedHashMap<String, List<Integer>>();
-      LinkedHashMap<String, List<List<String>>> aliasToPartitionBucketFileNamesMapping
=
-          new LinkedHashMap<String, List<List<String>>>();
+
+      // (alias to <Partition, BucketNumber>)
+      // AND (alias to <Partition, BucketFileNames>)
+      // one pair for each partition
+      // partition key/values info is needed in optimization
+      LinkedHashMap<String, List<Map<Partition, Integer>>>
+        aliasToPartitionBucketNumberMapping =
+        new LinkedHashMap<String, List<Map<Partition, Integer>>>();
+      LinkedHashMap<String, List<Map<Partition, List<String>>>>
+        aliasToPartitionBucketFileNamesMapping =
+        new LinkedHashMap<String, List<Map<Partition, List<String>>>>();

                
> BucketMapJoinOptimizer should optimize joins on partition columns
> -----------------------------------------------------------------
>
>                 Key: HIVE-3467
>                 URL: https://issues.apache.org/jira/browse/HIVE-3467
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Kevin Wilfong
>
> Consider the query:
> SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
> Where t1 and t2 are partitioned by part and bucketed by key.
> Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
> The bucket map join optimizer will put the first bucket of part=1 and part=2 partitions
of t2 into the same mapper as that of part=1 partition of t1.  It will do the same for the
part=2 partition of t1.
> It could take advantage of the partition values and send the first bucket of only the
part=1 partitions of t1 and t2 into one mapper and the first bucket of only the part=2 partitions
into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message