Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E21411271 for ; Wed, 20 Aug 2014 03:39:19 +0000 (UTC) Received: (qmail 77633 invoked by uid 500); 20 Aug 2014 03:39:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 77549 invoked by uid 500); 20 Aug 2014 03:39:18 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 77531 invoked by uid 500); 20 Aug 2014 03:39:18 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 77528 invoked by uid 99); 20 Aug 2014 03:39:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Aug 2014 03:39:18 +0000 Date: Wed, 20 Aug 2014 03:39:18 +0000 (UTC) From: "Szehon Ho (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7384) Research into reduce-side join [Spark Branch] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7384: ---------------------------- Attachment: Hive on Spark Reduce Side Join.docx > Research into reduce-side join [Spark Branch] > --------------------------------------------- > > Key: HIVE-7384 > URL: https://issues.apache.org/jira/browse/HIVE-7384 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Szehon Ho > Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt > > > Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. > A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)