Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E684ADAE9 for ; Fri, 2 Nov 2012 03:17:32 +0000 (UTC) Received: (qmail 91667 invoked by uid 500); 2 Nov 2012 03:17:31 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 91093 invoked by uid 500); 2 Nov 2012 03:17:31 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 91063 invoked by uid 99); 2 Nov 2012 03:17:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2012 03:17:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2012 03:17:25 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so5192829iea.35 for ; Thu, 01 Nov 2012 20:17:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=faqxbNcYVe5MSM/Ag84U1iTmkfm5UpNisKRSk2UOoQY=; b=vBmZP/SXL6//FglQctievXDcGPvCn6taGVMckHXTiUgXqT5MQDsy2ePN8+BcZTHcTM NC38S/NVq28YXf+hbpFgSITcmOyKjEwiEUTAuMnYJ2L3MptcN+pScj58mnNJcugzdAZD 3XrY6YrRMf28s5KlKHJRapG6lvRov9r0YD+2R2GukijquGmLUpWcKVATarjxnkL5Ywh+ fAW0FkTPFzdkfzHhLMrLR6B8Q7RCWcR3/9pbQXyjh1+WsqILvZ0Xx9ARdx/YQCzYQS/3 bCRyOYUbve93FC2Ly00g5vcYedueUOt7xCU/VIiyHLpcPoYW0nPF1p4MKDsNOfHY4ggY /yjA== MIME-Version: 1.0 Received: by 10.50.157.200 with SMTP id wo8mr380034igb.29.1351826225011; Thu, 01 Nov 2012 20:17:05 -0700 (PDT) Received: by 10.64.97.106 with HTTP; Thu, 1 Nov 2012 20:17:04 -0700 (PDT) In-Reply-To: References: Date: Thu, 1 Nov 2012 23:17:04 -0400 Message-ID: Subject: Re: Group By Concatenation From: Edward Capriolo To: user@hive.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Collect_set() is built into hive. If you want a version that does not de-duplicate look here. https://github.com/edwardcapriolo/hive-collect Caution both of these functions can produce out of memory if the results are later then a mapper can store in memory. On Thu, Nov 1, 2012 at 2:27 PM, Ratner, Alan S (IS) w= rote: > Sorry to ask what is probably a very na=EFve Hive question but here goes: > > > > I have a table as follows: > > Col1 Col2 > > K1 V1 > > K1 V1 > > K2 V1 > > K3 V1 > > K1 V2 > > K1 V3 > > K2 V2 > > > > Now I have managed to SELECT Col1,COUNT(DISTINCT Col2) FROM =85 BY COL1; = to > obtain > > K1 3 > > K2 2 > > K3 1 > > > > But what I want is a concatenated list of all the distinct Col2 values fo= r > each Col1 key. > > K1 V1 V2 V3 > > K2 V1 V2 > > K3 V1 > > > > This is something absolutely trivial in MR but I cannot seem to find > anything in Hive that will do this for me. Do I have to write a UDF to > accomplish this? > > > > > > Alan > >