Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 16 Jan 2013 16:46:12 +0000 (UTC)
From: "Ashutosh Chauhan (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12625804.1357179739300.145941.1358354772941@arcas>
In-Reply-To: <JIRA.12625804.1357179739300@arcas>
References: <JIRA.12625804.1357179739300@arcas>
Subject: [jira] [Commented] (HIVE-3852) Multi-groupby optimization fails
 when same distinct column is used twice or more
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555191#comment-13555191 ] 

Ashutosh Chauhan commented on HIVE-3852:
----------------------------------------

Namit,
bq. Should we have this optimization now ?
I am not sure which particular optimization you are referring to. I assume you mean there is no need for reduce-side groupbys anymore, since we have map-side aggregates. If so, I think those are still required. As Navis, pointed out if reduction ratio is not high enough, mappers may run out of memory and than we suggest users to turn-off map-side aggregation.

                
> Multi-groupby optimization fails when same distinct column is used twice or more
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-3852
>                 URL: https://issues.apache.org/jira/browse/HIVE-3852
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira