Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A33D18DFB for ; Wed, 22 Jul 2015 05:34:05 +0000 (UTC) Received: (qmail 30115 invoked by uid 500); 22 Jul 2015 05:34:05 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 30091 invoked by uid 500); 22 Jul 2015 05:34:05 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 29988 invoked by uid 99); 22 Jul 2015 05:34:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jul 2015 05:34:05 +0000 Date: Wed, 22 Jul 2015 05:34:05 +0000 (UTC) From: "Lefty Leverenz (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10673: ---------------------------------- Labels: TODOC1.3 (was: ) > Dynamically partitioned hash join for Tez > ----------------------------------------- > > Key: HIVE-10673 > URL: https://issues.apache.org/jira/browse/HIVE-10673 > Project: Hive > Issue Type: New Feature > Components: Query Planning, Query Processor > Reporter: Jason Dere > Assignee: Jason Dere > Labels: TODOC1.3 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch > > > Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the CPU was spent during sorting/merging. > While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)