Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 284701060A for ; Thu, 10 Apr 2014 23:33:30 +0000 (UTC) Received: (qmail 78238 invoked by uid 500); 10 Apr 2014 23:33:19 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 78079 invoked by uid 500); 10 Apr 2014 23:33:17 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 77957 invoked by uid 500); 10 Apr 2014 23:33:15 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 77943 invoked by uid 99); 10 Apr 2014 23:33:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Apr 2014 23:33:15 +0000 Date: Thu, 10 Apr 2014 23:33:15 +0000 (UTC) From: "Jitendra Nath Pandey (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6873: --------------------------------------- Status: Open (was: Patch Available) > DISTINCT clause in aggregates is handled incorrectly by vectorized execution > ---------------------------------------------------------------------------- > > Key: HIVE-6873 > URL: https://issues.apache.org/jira/browse/HIVE-6873 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.13.0, 0.14.0 > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch, HIVE-6873.3.patch > > > The vectorized aggregates ignore the DISTINCT clause. This cause incorrect results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall aggregate keys the vectorized aggregates do account for the extra key, but they do not process the data correctly for the key. the reduce side the aggregates the input from the vectorized map side to results that are only sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but meantime I'm filing a bug to disable vectorized execution if DISTINCT is present. Fix is trivial. -- This message was sent by Atlassian JIRA (v6.2#6252)