hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <>
Subject [jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
Date Wed, 22 Jul 2015 05:34:05 GMT


Lefty Leverenz updated HIVE-10673:
    Labels: TODOC1.3  (was: )

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>                 Key: HIVE-10673
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>              Labels: TODOC1.3
>             Fix For: 1.3.0, 2.0.0
>         Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12,
HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch,
HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch
> Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the
CPU was spent during sorting/merging.
> While this does not work for MR, for other execution engines (such as Tez), it is possible
to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting,
which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join
algorithm to perform the join in the reducer. This will require the small tables in the join
to fit in the reducer/hash table for this to work.

This message was sent by Atlassian JIRA

View raw message