Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8150710A62 for ; Tue, 25 Mar 2014 13:58:02 +0000 (UTC) Received: (qmail 20309 invoked by uid 500); 25 Mar 2014 13:57:58 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 20203 invoked by uid 500); 25 Mar 2014 13:57:56 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 20181 invoked by uid 99); 25 Mar 2014 13:57:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2014 13:57:53 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of java8964@hotmail.com designates 65.55.90.150 as permitted sender) Received: from [65.55.90.150] (HELO snt0-omc3-s11.snt0.hotmail.com) (65.55.90.150) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2014 13:57:47 +0000 Received: from SNT149-W58 ([65.55.90.137]) by snt0-omc3-s11.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 25 Mar 2014 06:57:25 -0700 X-TMN: [HtGmu6FaPvS3Bm0wlnzdd13/n/ljyIvbge35vMFXVO4=] X-Originating-Email: [java8964@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_cd0fa239-3211-47e5-af5f-84c654793b08_" From: java8964 To: "user@hive.apache.org" Subject: RE: Does hive instantiate new udf object for each record Date: Tue, 25 Mar 2014 09:57:25 -0400 Importance: Normal In-Reply-To: References: ,, MIME-Version: 1.0 X-OriginalArrivalTime: 25 Mar 2014 13:57:25.0475 (UTC) FILETIME=[26DF0B30:01CF4832] X-Virus-Checked: Checked by ClamAV on apache.org --_cd0fa239-3211-47e5-af5f-84c654793b08_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The reason you saw that is because when you provide evaluate() method=2C yo= u didn't specified the type of column it can be used. So Hive will just cre= ate test instance again and again for every new row=2C as it doesn't know h= ow or which column to apply your UDF. I changed your code as below: public class test extends UDF { private Text t=3B public Text evaluate (String s) { if(t=3D=3Dnull) { t=3Dnew Text("initialization")=3B } else { t=3Dnew Text("OK")=3B } return t=3B } public Text evaluate () { if(t=3D=3Dnull) { t=3Dnew Text("initialization")=3B } else { t=3Dnew Text("OK")=3B } return t=3B } } Now=2C if you invoke your UDF like this: select test(colA) from AnyTable=3B You should see one "Init" and the rest are "OK"=2C make sense? Yong From: sky880883368@hotmail.com To: user@hive.apache.org Subject: RE: Does hive instantiate new udf object for each record Date: Tue=2C 25 Mar 2014 10:17:46 +0800 =0A= =0A= =0A= I have implemented a simple udf for test. public class test extends UDF { private Text t=3B public Text evaluate () { if(t=3D=3Dnull) { t=3Dnew Text("initialization")=3B } else { t=3Dnew Text("OK")=3B } return t=3B } } And the test query: select test() from AnyTable=3B I got initialization initialization initialization ... I have also implemented a similar GenericUDF=2C and got similar result. What' wrong with my code? Best Regards=2CypgFrom: java8964@hotmail.com To: user@hive.apache.org Subject: RE: Does hive instantiate new udf object for each record Date: Mon=2C 24 Mar 2014 16:58:49 -0400 =0A= =0A= =0A= Your UDF object will only initialized once per map or reducer.=20 When you said your UDF object being initialized for each row=2C why do you = think so? Do you have log to make you think that way? If OK=2C please provide more information=2C so we can help you=2C like your= example code=2C log etc.... Yong Date: Tue=2C 25 Mar 2014 00:30:21 +0800 From: sky880883368@hotmail.com To: user@hive.apache.org Subject: Does hive instantiate new udf object for each record =0A= Hi all=2C I'm trying to implement a udf which makes use of some data structur= es like binary tree. However=2C it seems that hive instantiate= s new udf object for each row in the table. Then the data structures would = be also initialized again and again for each row. Whereas=2C in = the book =2C a geoip function is taken for an example sho= wing that a LookupService object "is saved in a reference so it only needs = to be initialized once in the lifetime of a map or reduce task that initial= izes it". The code for this function can be found here (https://github.com/= edwardcapriolo/hive-geoip/). Could anyone give me some ideas how to make the udf object initiali= ze once in the lifetime of a map or reduce task? =0A= Best Regards=2Cypg=0A= =0A= = --_cd0fa239-3211-47e5-af5f-84c654793b08_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
The reason you saw that is = because when you provide evaluate() method=2C you didn't specified the type= of column it can be used. So Hive will just create test instance again and= again for every new row=2C as it doesn't know how or which column to apply= your UDF.

I changed your code as below:

public class test extends UDF {
 =3B =3B = =3B private Text t=3B

 =3B  =3B public Text evaluate = (String s) {
 =3B =3B =3B  =3B =3B =3B if(t=3D= =3Dnull) {
 =3B =3B =3B  =3B =3B =3B  =3B&nb= sp=3B =3B t=3Dnew Text("initialization")=3B
 =3B =3B =3B=  =3B =3B =3B }
 =3B =3B =3B  =3B =3B&nb= sp=3B else {
 =3B =3B =3B  =3B =3B =3B  =3B&= nbsp=3B =3B t=3Dnew Text("OK")=3B
 =3B =3B =3B  =3B&= nbsp=3B =3B }
 =3B =3B =3B  =3B =3B =3B retu= rn t=3B
 =3B =3B =3B }

 = =3B  =3B public Text evaluate () {
 =3B =3B =3B  =3B=  =3B =3B if(t=3D=3Dnull) {
 =3B =3B =3B  =3B&nbs= p=3B =3B  =3B =3B =3B t=3Dnew Text("initialization")=3B
=  =3B =3B =3B  =3B =3B =3B }
 =3B =3B&nbs= p=3B  =3B =3B =3B else {
 =3B =3B =3B  =3B&n= bsp=3B =3B  =3B =3B =3B t=3Dnew Text("OK")=3B
 =3B&n= bsp=3B =3B  =3B =3B =3B }
 =3B =3B =3B  = =3B =3B =3B return t=3B
 =3B =3B =3B }
}

Now=2C if you invoke your UDF like this:=

select test(colA) from AnyTable=3B<= /div>

You should see one "Init" and the rest a= re "OK"=2C make sense?

Yong


From: sky880883368@hotmail.com
To= : user@hive.apache.org
Subject: RE: Does hive instantiate new udf object= for each record
Date: Tue=2C 25 Mar 2014 10:17:46 +0800

=0A= =0A= =0A=
I have implemented a simple udf for test.


pu= blic class test extends UDF {
 =3B =3B =3B private Text t=3B=

 =3B =3B =3B public Text evaluate () {
 =3B = =3B =3B  =3B =3B =3B if(t=3D=3Dnull) {
 =3B =3B&= nbsp=3B  =3B =3B =3B  =3B =3B =3B t=3Dnew Text("ini= tialization")=3B
 =3B =3B =3B  =3B =3B =3B }
=  =3B =3B =3B  =3B =3B =3B else {
 =3B = =3B =3B  =3B =3B =3B  =3B =3B =3B t=3Dnew Text(= "OK")=3B
 =3B =3B =3B  =3B =3B =3B }
 =3B=  =3B =3B  =3B =3B =3B return t=3B
 =3B =3B&n= bsp=3B }
}


And the test query: select test() from AnyTable= =3B
I got
initialization
initialization
initialization...


I have also implemented a similar GenericUDF=2C and got si= milar result.

What' wrong with my code?

Best Reg= ards=2C
ypg

From: jav= a8964@hotmail.com
To: user@hive.apache.org
Subject: RE: Does hive ins= tantiate new udf object for each record
Date: Mon=2C 24 Mar 2014 16:58:4= 9 -0400

=0A= =0A= =0A=
Your UDF object will only initialized once per map or redu= cer. =3B

When you said your UDF object being initial= ized for each row=2C why do you think so? Do you have log to make you think= that way?

If OK=2C please provide more informatio= n=2C so we can help you=2C like your example code=2C log etc....
=
Yong


Date: Tue=2C 25 = Mar 2014 00:30:21 +0800
From: sky880883368@hotmail.com
To: user@hive.= apache.org
Subject: Does hive instantiate new udf object for each record=

=0A=
Hi all=2C

 =3B  =3B &nbs= p=3B =3B =3B =3BI'm trying to implement a udf which makes use o= f some data structures like binary tree. =3B
 =3B  = =3B =3B
 =3B  =3B  =3B =3B =3B =3BHow= ever=2C  =3Bit seems that hive instantiates new udf object for each row= in the table. Then the data structures would be also initialized again and= again for each row.
 =3B  =3B =3B
 =3B=  =3B  =3B =3B =3B =3BWhereas=2C in the book <=3BProg= ramming Hive>=3B=2C a geoip function is taken for an example showing that= a LookupService object "is saved in a reference so it only needs to = be =3B
initialized once in the lifetime of a map or reduce = task that initializes it". The code for this function can be found here (ht= tps://github.com/edwardcapriolo/hive-geoip/).

 =3B &= nbsp=3B  =3B  =3B Could anyone give me some ideas how to make the u= df object =3Binitialize once =3Bin the lifetime of a map or reduce = task?

 =3B  =3B =3B
=0A=
Best Regards=2C
ypg

=0A=

=0A=
=
= --_cd0fa239-3211-47e5-af5f-84c654793b08_--