Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 75496 invoked from network); 6 Apr 2010 17:35:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Apr 2010 17:35:21 -0000 Received: (qmail 21512 invoked by uid 500); 6 Apr 2010 17:35:21 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 21466 invoked by uid 500); 6 Apr 2010 17:35:20 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 21458 invoked by uid 99); 6 Apr 2010 17:35:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 17:35:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of goukas@gmail.com designates 209.85.223.198 as permitted sender) Received: from [209.85.223.198] (HELO mail-iw0-f198.google.com) (209.85.223.198) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 17:35:12 +0000 Received: by iwn36 with SMTP id 36so134507iwn.29 for ; Tue, 06 Apr 2010 10:34:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ZYjKS6S3CZqMq4qZSyQI3x+I7/S/bJp/nUL0gUN2de0=; b=reWbQMk/u+yS7SnhAtk51wma72Ktj7wPqlfPSQTSacKO8FQpwOqC5SvR2lAJeiu7Pt 0zCqhd2el0yVQs51PpHyR1N0NBsuLgyLV4wsKvYeobSJmzrsSUqt617oC5Nidah9WZt1 Vgt/TnLCkT7EdkoA3NmrgFIo6itRhbDlBvoSs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=utP6w4S9E3uu7YqK/WLdFYRSdug7DgBOj4EuZLUBhD2uMkyWcZ8iTw+dXLw4ouGs5F UCpk1VjnlgVTse0HunYL3tKwyUzm4S4EGGx+oMn4v9oNDlqBj3gdXe5YV6meUXzEbxBR EkeHnr4ved8CCMu8jTO39catmcQlLW9JbUkzk= MIME-Version: 1.0 Received: by 10.231.15.69 with HTTP; Tue, 6 Apr 2010 10:34:50 -0700 (PDT) In-Reply-To: References: Date: Tue, 6 Apr 2010 13:34:50 -0400 Received: by 10.231.146.2 with SMTP id f2mr3492955ibv.23.1270575290428; Tue, 06 Apr 2010 10:34:50 -0700 (PDT) Message-ID: Subject: Re: UDAF on AWS Hive From: Matthew Bryan To: hive-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks Zheng, and thanks for your great support to this list. I took your idea and wrote the following code that worked for me...I'm no Java whiz...so it's probably fairly inefficient. I do get to talk to the Amazon folks from time to time, so I'll definitely mention my interest in upgrading the Hive version. Thanks again. Matt package com.company.hadoop.hive.udaf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.IntWritable; import java.util.Arrays; public class UDAFGroupConcat extends UDAF{ public static class GroupConcatStringEvaluator implements UDAFEvaluator { private Text mOutput; private boolean mEmpty; public GroupConcatStringEvaluator() { super(); init(); } public void init() { mOutput =3D null; mEmpty =3D true; } public boolean iterate(Text o, IntWritable N) { if (o!=3Dnull) { if(mEmpty) { mOutput =3D new Text(N+" "+o.toString()); mEmpty =3D false; } else { String temp =3D mOutput.toString() + "\t" + N + " " + o.toString(); String[] split =3D temp.split("\t"); Arrays.sort(split); String sorted =3D split[0]; for (int i =3D 1; i < split.length; i++) { sorted =3D sorted + "\t" + split[i]= ; } mOutput.set(sorted); } } return true; } public Text terminatePartial() {return mEmpty ? null : mOutput;} public boolean merge(Text o) { if (o!=3Dnull) { if(mEmpty) { mOutput =3D new Text(o.toString()); mEmpty =3D false; } else { String temp =3D mOutput.toString() + "\t" + o.toString(); String[] split =3D temp.split("\t"); Arrays.sort(split); String sorted =3D split[0]; for (int i =3D 1; i < split.length; i++) { sorted =3D sorted + "\t" + split[i]= ; } mOutput.set(sorted); } } return true; } public Text terminate() {return mEmpty ? null : mOutput;} } } On Fri, Apr 2, 2010 at 4:11 PM, Matthew Bryan wrote: > I'm writing a basic group_concat UDAF for the Amazon version of > Hive....and it's working fine for unordered groupings. But I can't > seem to get an ordered version working (filling an array based on an > IntWritable passed alongside). When I move from using Text return type > on terminatePartial() to either Text[] or a State class I start > getting errors: > > FAILED: Error in semantic analysis: > org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize > return type class [Lorg.apache.hadoop.io.Text; from public > org.apache.hadoop.io.Text[] > com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator= .terminatePartial() > > or > > FAILED: Error in semantic analysis: > org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize > return type class > com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConc > atNState from public > com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConcatNState > com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator= .terminatePartial > () > > What limits are there on the return type of > terminatePartial()....shouldn't it just have to match the argument of > merge and nothing more? Keep in mind this is the Amazon version of > Hive (0.4 I think).... > > I put both versions of the UDAF below, ordered and unordered. > > Thanks for your time. > > Matt > > > ######### Working Unordered ############ > /*QUERY: select user, event, group_concat(details) from datatable > group by user,event;*/ > > package com.company.hadoop.hive.udaf; > > import org.apache.hadoop.hive.ql.exec.UDAF; > import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; > import org.apache.hadoop.io.Text; > > public class UDAFGroupConcat extends UDAF{ > > =A0 =A0 =A0 =A0public static class GroupConcatStringEvaluator implements > UDAFEvaluator { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0private Text mOutput; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0private boolean mEmpty; > > =A0 =A0 =A0 =A0public GroupConcatStringEvaluator() { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0super(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0init(); > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0public void init() { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mOutput =3D null; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mEmpty =3D true; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0public boolean iterate(Text o) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (o!=3Dnull) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if(mEmpty) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mOutput = =3D new Text(o); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mEmpty =3D= false; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mOutput.se= t(mOutput.toString()+" > "+o.toString()); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return true; > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0public Text terminatePartial() {return mEmpty ? null : mOu= tput;} > =A0 =A0 =A0 =A0public boolean merge(Text o) {return iterate(o);} > =A0 =A0 =A0 =A0public Text terminate() {return mEmpty ? null : mOutput;} > } > } > > ############ Not Working Ordered ############# > /*QUERY: select user, event, group_concatN(details, detail_id) from > datatable group by user,event;*/ > > package com.company.hadoop.hive.udaf; > > import org.apache.hadoop.hive.ql.exec.UDAF; > import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.io.IntWritable; > > public class UDAFGroupConcatN extends UDAF{ > > =A0 =A0 =A0 =A0public static class GroupConcatNStringEvaluator implements > UDAFEvaluator { > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0private Text[] mArray; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0private boolean mEmpty; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0public GroupConcatNStringEvaluator() { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0super(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0init(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0public void init() { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mArray =3D new Text[5]; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mEmpty =3D true; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0public boolean iterate(Text o, IntWritable N) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (o!=3Dnull&&N!=3Dnull) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mArray[N.get()].set(o.toSt= ring()); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mEmpty=3Dfalse; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return true; > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0public Text[] terminatePartial() {return mEmpty ? null : m= Array;} > =A0 =A0 =A0 =A0public boolean merge(Text[] o) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (o!=3Dnull) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for(int i=3D0; i<=3D5; i++= ){ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if(mArray[= i].getLength()=3D=3D0){ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0mArray[i].set(o[i].toString()); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return true; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0public Text[] terminate() {return mEmpty ? null : mArray;} > } > } >