hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: Does hive instantiate new udf object for each record
Date Tue, 25 Mar 2014 13:57:25 GMT
The reason you saw that is because when you provide evaluate() method, you didn't specified
the type of column it can be used. So Hive will just create test instance again and again
for every new row, as it doesn't know how or which column to apply your UDF.
I changed your code as below:
public class test extends UDF {
    private Text t;

    public Text evaluate (String s) {
        if(t==null) {
            t=new Text("initialization");
        }
        else {
            t=new Text("OK");
        }
        return t;
    }

    public Text evaluate () {
        if(t==null) {
            t=new Text("initialization");
        }
        else {
            t=new Text("OK");
        }
        return t;
    }
}
Now, if you invoke your UDF like this:
select test(colA) from AnyTable;
You should see one "Init" and the rest are "OK", make sense?
Yong
From: sky880883368@hotmail.com
To: user@hive.apache.org
Subject: RE: Does hive instantiate new udf object for each record
Date: Tue, 25 Mar 2014 10:17:46 +0800




I have implemented a simple udf for test.


public class test extends UDF {
    private Text t;

    public Text evaluate () {
        if(t==null) {
            t=new Text("initialization");
        }
        else {
            t=new Text("OK");
        }
        return t;
    }
}

And the test query: select test() from AnyTable;
I got
initialization
initialization
initialization
...

I have also implemented a similar GenericUDF, and got similar result.

What' wrong with my code?

Best Regards,ypgFrom: java8964@hotmail.com
To: user@hive.apache.org
Subject: RE: Does hive instantiate new udf object for each record
Date: Mon, 24 Mar 2014 16:58:49 -0400




Your UDF object will only initialized once per map or reducer. 
When you said your UDF object being initialized for each row, why do you think so? Do you
have log to make you think that way?
If OK, please provide more information, so we can help you, like your example code, log etc....
Yong

Date: Tue, 25 Mar 2014 00:30:21 +0800
From: sky880883368@hotmail.com
To: user@hive.apache.org
Subject: Does hive instantiate new udf object for each record


Hi all,
        I'm trying to implement a udf which makes use of some data structures like binary
tree.             However,  it seems that hive instantiates new udf object for each row in
the table. Then the data structures would be also initialized again and again for each row.
           Whereas, in the book <Programming Hive>, a geoip function is taken for an
example showing that a LookupService object "is saved in a reference so it only needs to be
initialized once in the lifetime of a map or reduce task that initializes it". The code for
this function can be found here (https://github.com/edwardcapriolo/hive-geoip/).
        Could anyone give me some ideas how to make the udf object initialize once in the
lifetime of a map or reduce task?
    
Best Regards,ypg


 		 	   		   		 	   		   		 	   		  
Mime
View raw message