Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2478F1081F for ; Thu, 1 Aug 2013 00:09:49 +0000 (UTC) Received: (qmail 58857 invoked by uid 500); 1 Aug 2013 00:09:48 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 58806 invoked by uid 500); 1 Aug 2013 00:09:48 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 58797 invoked by uid 500); 1 Aug 2013 00:09:48 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 58794 invoked by uid 99); 1 Aug 2013 00:09:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2013 00:09:48 +0000 Date: Thu, 1 Aug 2013 00:09:48 +0000 (UTC) From: "Harish Butani (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725882#comment-13725882 ] Harish Butani commented on HIVE-4966: ------------------------------------- For my understanding, are you adding a new function collect_array or are you enhancing collect_set to have a dedup=true/false option. The signatures of collect_map and collect_set/array are different. So we have to expose them as separate fns. But open to sharing a single implementation. Makes sense. What specifically do you have in mind? > Introduce Collect_Map UDAF > -------------------------- > > Key: HIVE-4966 > URL: https://issues.apache.org/jira/browse/HIVE-4966 > Project: Hive > Issue Type: Bug > Reporter: Harish Butani > Assignee: Harish Butani > > Similar to Collect_Set. For e.g. on a Txn table > {noformat} > Txn(customer, product, amt) > select customer, collect_map(product, amt) > from txn > group by customer > {noformat} > Would give you an activity map for each customer. > Other thoughts: > - have explode do the inverse on maps just as it does for sets today. > - introduce a table function that outputs each value as a column. So in the e.g. above you get an activity matrix instead of a map. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira