hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <>
Subject [jira] [Updated] (HIVE-8638) Implement bucket map join optimization [Spark Branch]
Date Mon, 08 Dec 2014 22:53:12 GMT


Jimmy Xiang updated HIVE-8638:
    Attachment: HIVE-8638.5-spark.patch

Added patch v5 that added some comments, fixed the golden file for auto_sortmerge_join_11.q.
The change to the golden file is because we do bucketmap join optimization if hive.optimize.bucketmapjoin
is set. Originally, it does such optimization only if mapjoin hints is set. This test looks
like to be better called some bucket mapjoin test.

> Implement bucket map join optimization [Spark Branch]
> -----------------------------------------------------
>                 Key: HIVE-8638
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Na Yang
>            Assignee: Jimmy Xiang
>             Fix For: spark-branch
>         Attachments: HIVE-8638.4-spark.patch, HIVE-8638.5-spark.patch
> In the hive-on-mr implementation, bucket map join optimization has to depend on the map
join hint. While in the hive-on-tez implementation, a join can be automatically converted
to bucket map join if certain conditions are met such as: 
> 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON
> 2. all join tables are buckets and each small table's bucket number can be divided by
big table's bucket number
> 3. bucket columns == join columns
> In the hive-on-spark implementation, it is ideal to have the bucket map join auto-convertion
support. when all the required criteria are met, a join can be automatically converted to
a bucket map join.

This message was sent by Atlassian JIRA

View raw message