Return-Path: Delivered-To: apmail-hive-user-archive@www.apache.org Received: (qmail 77605 invoked from network); 12 Mar 2011 03:01:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Mar 2011 03:01:11 -0000 Received: (qmail 43097 invoked by uid 500); 12 Mar 2011 03:01:11 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 42817 invoked by uid 500); 12 Mar 2011 03:01:10 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 42809 invoked by uid 99); 12 Mar 2011 03:01:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Mar 2011 03:01:10 +0000 X-ASF-Spam-Status: No, hits=1.1 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Mar 2011 03:01:01 +0000 Received: from sp1-ex07cas01.ds.corp.yahoo.com (sp1-ex07cas01.ds.corp.yahoo.com [216.252.116.137]) by mrout2-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p2C30E1Y073640; Fri, 11 Mar 2011 19:00:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1299898814; bh=CGEU9X/1+4vIwTYWUapky1vR216JErGS0ZRzv2oKvBs=; h=From:To:CC:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=MvYRa1+aToDvvdjLPdCOfGumYpIsxzCwMUgOF3VN9e2u5U3bTi+oUdK3AGNrjTI5K BUKihCROYNHvPq608H06RtrtwRX4GpOS9Ayay7fHIKP6CJHCUmfF8VHE9wU5rBCEib 9AZ+EreBWKMkTjOMJGWA7Xc2/7X/KPYWSNes4M2g= Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by sp1-ex07cas01.ds.corp.yahoo.com ([216.252.116.137]) with mapi; Fri, 11 Mar 2011 19:00:14 -0800 From: Aurora Skarra-Gallagher To: "user@hive.apache.org" CC: "Christopher, Pat" , Steven Wong Date: Fri, 11 Mar 2011 19:00:12 -0800 Subject: Re: UDAF documentation Thread-Topic: UDAF documentation Thread-Index: AcvgYZtqsM29Y4wDTMKl+BjSMpd5kQ== Message-ID: <7ABBE390-E832-462A-B743-DABF9694CA6F@yahoo-inc.com> References: <4F6B25AFFFCAFE44B6259A412D5F9B102DED87E0@ExchMBX104.netflix.com> <08A92376-1AE0-47D4-8C92-D6979FFD7214@yahoo-inc.com> In-Reply-To: <08A92376-1AE0-47D4-8C92-D6979FFD7214@yahoo-inc.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I'll just keep responding to myself. ;) I ended up figuring out how to do it. I just used junit and called init, it= erate, terminatePartial, etc from inside the unit test. After knowing a typ= ical flow of function calls (as I mentioned below), the main other gotcha i= s making sure to have a new UDAF object for each instance. For example, in = my example below, there would be three separate UDAF instances. -Aurora On Mar 11, 2011, at 5:02 PM, Aurora Skarra-Gallagher wrote: > I'm looking for something like this, but for a UDAF instead of a UDF: > http://svn.apache.org/repos/asf/hive/branches/branch-0.7/ql/src/test/org/= apache/hadoop/hive/ql/udf/TestUDFDateDiff.java >=20 > -Aurora >=20 > On Mar 11, 2011, at 4:44 PM, Aurora Skarra-Gallagher wrote: >=20 >> Hi, >>=20 >> Did you actually call those functions directly from your unit tests? I'm= looking for examples of that working, but all I see reference to are tests= to make sure the query produces the expected output (rather than directly = testing the UDAF). >>=20 >> -Aurora >>=20 >> On Mar 11, 2011, at 3:44 PM, Christopher, Pat wrote: >>=20 >>> Awesome, awesome. That's what I had pieced together from Steve and Ed'= s emails. Glad to get confirmation on it. >>>=20 >>> Its also what I did for my unit testing. I also called everything with= null arguments to make sure those got handled gracefully. >>>=20 >>> Pat >>>=20 >>> -----Original Message----- >>> From: Aurora Skarra-Gallagher [mailto:aurora@yahoo-inc.com]=20 >>> Sent: Friday, March 11, 2011 3:40 PM >>> To: user@hive.apache.org >>> Cc: Steven Wong >>> Subject: Re: UDAF documentation >>>=20 >>> Hadoop: The Definitive Guide has a good section on this. Chapter 12: Hi= ve: User Defined Functions. It has a diagram that shows how things are call= ed and when. The example I'm looking at shows this sequence: >>>=20 >>> (first instance) >>> init() >>> iterate(1) >>> iterate(2) >>> iterate(3) >>> terminatePartial() >>>=20 >>> (second instance) >>> init() >>> iterate(4) >>> iterate(2) >>> terminatePartial() >>>=20 >>> (then) >>> init() >>> merge(3) >>> merge(4) >>> terminate() >>>=20 >>> The UDAF being described is a max integer function, hence the merge end= ing up with the highest integer from each instance. >>>=20 >>> -Aurora >>>=20 >>> On Mar 11, 2011, at 9:54 AM, Christopher, Pat wrote: >>>=20 >>>> Ahh, perfect. The docs don't agree terribly well but the case study i= s great. The context for when merge() gets called was not clear to me. >>>>=20 >>>> Thanks guys! >>>>=20 >>>> Pat >>>>=20 >>>> -----Original Message----- >>>> From: Steven Wong [mailto:swong@netflix.com]=20 >>>> Sent: Thursday, March 10, 2011 6:24 PM >>>> To: user@hive.apache.org >>>> Cc: Christopher, Pat >>>> Subject: RE: UDAF documentation >>>>=20 >>>> Take a look at http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy= , in case you haven't found it already. >>>>=20 >>>>=20 >>>> -----Original Message----- >>>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]=20 >>>> Sent: Thursday, March 10, 2011 6:18 PM >>>> To: user@hive.apache.org >>>> Cc: Christopher, Pat >>>> Subject: Re: UDAF documentation >>>>=20 >>>> On Thu, Mar 10, 2011 at 8:27 PM, Christopher, Pat >>>> wrote: >>>>> Hi Guys, >>>>>=20 >>>>> I'm writing a UDAF to run against hive 0.5 or hive 0.7. The document= ation I >>>>> can find says to implement UDAFEvaluator and ensure that you implemen= t >>>>> init() , aggregate() and evaluate(). However, all of the examples I = can >>>>> find implement init(), iterate(), merge(), terminatePartial() and >>>>> terminate(). >>>>>=20 >>>>>=20 >>>>>=20 >>>>> What's the difference and where I can find the documentation on how t= o write >>>>> a UDAF? >>>>>=20 >>>>>=20 >>>>>=20 >>>>> Thanks, >>>>>=20 >>>>> Pat >>>>=20 >>>> At time the documentation may lag behind the code. I would checkout >>>> the hive source code for the version you are working with and base >>>> your work on other already existing UDAF's that are similar. >>>>=20 >>>> Edward >>>>=20 >>>=20 >>=20 >=20