hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
Date Wed, 22 Jul 2015 00:44:05 GMT


Jason Dere updated HIVE-10673:
    Release Note: This adds configuration parameter hive.optimize.dynamic.partition.hashjoin,
which enables selection of the dynamically partitioned hash join with the Tez execution engine

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>                 Key: HIVE-10673
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>             Fix For: 1.3.0, 2.0.0
>         Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12,
HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch,
HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch
> Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the
CPU was spent during sorting/merging.
> While this does not work for MR, for other execution engines (such as Tez), it is possible
to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting,
which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join
algorithm to perform the join in the reducer. This will require the small tables in the join
to fit in the reducer/hash table for this to work.

This message was sent by Atlassian JIRA

View raw message