Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34993106CC for ; Thu, 10 Apr 2014 23:53:28 +0000 (UTC) Received: (qmail 26113 invoked by uid 500); 10 Apr 2014 23:53:19 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 26028 invoked by uid 500); 10 Apr 2014 23:53:18 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 26000 invoked by uid 500); 10 Apr 2014 23:53:17 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 25860 invoked by uid 99); 10 Apr 2014 23:53:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Apr 2014 23:53:16 +0000 Date: Thu, 10 Apr 2014 23:53:16 +0000 (UTC) From: "Harish Butani (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966025#comment-13966025 ] Harish Butani commented on HIVE-6873: ------------------------------------- +1 for 0.13 > DISTINCT clause in aggregates is handled incorrectly by vectorized execution > ---------------------------------------------------------------------------- > > Key: HIVE-6873 > URL: https://issues.apache.org/jira/browse/HIVE-6873 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.13.0, 0.14.0 > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch, HIVE-6873.3.patch > > > The vectorized aggregates ignore the DISTINCT clause. This cause incorrect results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall aggregate keys the vectorized aggregates do account for the extra key, but they do not process the data correctly for the key. the reduce side the aggregates the input from the vectorized map side to results that are only sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but meantime I'm filing a bug to disable vectorized execution if DISTINCT is present. Fix is trivial. -- This message was sent by Atlassian JIRA (v6.2#6252)