Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4EECC200C24 for ; Thu, 9 Feb 2017 00:04:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4DA0E160B6E; Wed, 8 Feb 2017 23:04:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 98892160B67 for ; Thu, 9 Feb 2017 00:04:45 +0100 (CET) Received: (qmail 59315 invoked by uid 500); 8 Feb 2017 23:04:44 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 59306 invoked by uid 99); 8 Feb 2017 23:04:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2017 23:04:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 77D55C25BA for ; Wed, 8 Feb 2017 23:04:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6kLaUF87E2Tp for ; Wed, 8 Feb 2017 23:04:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A70E75FCD0 for ; Wed, 8 Feb 2017 23:04:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A61F4E05AE for ; Wed, 8 Feb 2017 23:04:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0EAD125291 for ; Wed, 8 Feb 2017 23:04:42 +0000 (UTC) Date: Wed, 8 Feb 2017 23:04:42 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15683) Make what's done in HIVE-15580 for group by configurable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Feb 2017 23:04:46 -0000 [ https://issues.apache.org/jira/browse/HIVE-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-15683: ------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Release Note: Document the new configuration for 2.2.0. Status: Resolved (was: Patch Available) Committed to master. Thanks for the review, Chao! > Make what's done in HIVE-15580 for group by configurable > -------------------------------------------------------- > > Key: HIVE-15683 > URL: https://issues.apache.org/jira/browse/HIVE-15683 > Project: Hive > Issue Type: Improvement > Components: Spark > Affects Versions: 2.2.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15683.1.patch, HIVE-15683.2.patch, HIVE-15683.patch > > > HIVE-15580 changed the way the data is shuffled for group by: instead of using Spark's groupByKey to shuffle data, Hive on Spark now uses repartitionAndSortWithinPartitions(), which generates (key, value) pairs instead of original (key, value iterator). This might have some performance implications, but it's needed to get rid of unbound memory usage by {{groupByKey}}. > Here we'd like to compare group by performance with or w/o HIVE-15580. If the impact is significant, we can provide a configuration that allows user to switch back to the original way of shuffling. > This work should be ideally done after HIVE-15682 as the optimization there should help the performance here as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)