hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join
Date Tue, 24 Mar 2015 04:03:52 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377237#comment-14377237
] 

Lefty Leverenz commented on HIVE-9277:
--------------------------------------

Doc note:  This adds *hive.mapjoin.hybridgrace.hashtable* & *hive.mapjoin.hybridgrace.memcheckfrequency*
to HiveConf.java, so they need to be documented in Configuration Properties for release 1.2.0.
 Add the information that currently this algorithm only works with Tez.

Does this also need general documentation?  Should the attached PDF file go in the wiki?

* [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]
* [Configuration Properties -- Tez (add new parameters to list of links) | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
* [Hive Joins | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins]

> Hybrid Hybrid Grace Hash Join
> -----------------------------
>
>                 Key: HIVE-9277
>                 URL: https://issues.apache.org/jira/browse/HIVE-9277
>             Project: Hive
>          Issue Type: New Feature
>          Components: Physical Optimizer
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>              Labels: TODOC1.2, join
>             Fix For: 1.2.0
>
>         Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, HIVE-9277.04.patch,
HIVE-9277.05.patch, HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, HIVE-9277.13.patch,
HIVE-9277.14.patch, HIVE-9277.15.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash
join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table doesn't fit in
memory as spilling some data from the build and probe sides will still be cheaper than having
to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and significant amount
of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message