From issues-return-139053-archive-asf-public=cust-asf.ponee.io@hive.apache.org Sun Oct 7 00:43:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 28F8718061A for ; Sun, 7 Oct 2018 00:43:03 +0200 (CEST) Received: (qmail 22242 invoked by uid 500); 6 Oct 2018 22:43:03 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 22233 invoked by uid 99); 6 Oct 2018 22:43:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2018 22:43:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E6877C06C0 for ; Sat, 6 Oct 2018 22:43:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 8CN_Eit0Kq2J for ; Sat, 6 Oct 2018 22:43:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 987815F500 for ; Sat, 6 Oct 2018 22:43:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A510EE0E5D for ; Sat, 6 Oct 2018 22:43:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 21AC024777 for ; Sat, 6 Oct 2018 22:43:00 +0000 (UTC) Date: Sat, 6 Oct 2018 22:43:00 +0000 (UTC) From: "Jesus Camacho Rodriguez (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-17043) Remove non unique columns from group by keys if not referenced later MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640892#comment-16640892 ] Jesus Camacho Rodriguez commented on HIVE-17043: ------------------------------------------------ [~vgarg], I left three minor comments in RB, could you take a look? I think other than that, patch LGTM. > Remove non unique columns from group by keys if not referenced later > -------------------------------------------------------------------- > > Key: HIVE-17043 > URL: https://issues.apache.org/jira/browse/HIVE-17043 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer > Affects Versions: 3.0.0 > Reporter: Ashutosh Chauhan > Assignee: Vineet Garg > Priority: Major > Attachments: HIVE-17043.1.patch, HIVE-17043.10.patch, HIVE-17043.11.patch, HIVE-17043.2.patch, HIVE-17043.3.patch, HIVE-17043.4.patch, HIVE-17043.5.patch, HIVE-17043.6.patch, HIVE-17043.7.patch, HIVE-17043.8.patch, HIVE-17043.9.patch > > > Group by keys may be a mix of unique (or primary) keys and regular columns. In such cases presence of regular column won't alter cardinality of groups. So, if regular columns are not referenced later, they can be dropped from group by keys. Depending on operator tree may result in those columns not being read at all from disk in best case. In worst case, we will avoid shuffling and sorting regular columns from mapper to reducer, which still could be substantial CPU and network savings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)