hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Dere <jd...@hortonworks.com>
Subject Re: UDF Configure method not getting called
Date Wed, 26 Aug 2015 00:05:19 GMT
For getting the configuration without configure(), this may not be the best thing to do but
you can try during your UDF's initialize() method. Note that initialize() is called during
query compilation, and also by each M/R task (followed at some point by configure()).

?During initialize() you can call SessionState.get(), if it is not null then this initialize()
call is happening during query compilation, and you can then use SessionState.get().getConf()
to get the configuration. GenericUDFBaseNumeric has an example of this.



As for trying to force map/reduce jobs, you can try hive.fetch.task.conversion=minimal/none
and hive.optimize.constant.propagation=false and see how it works.


________________________________
From: Rahul Sharma <kippy.pie@gmail.com>
Sent: Tuesday, August 25, 2015 2:48 PM
To: user@hive.apache.org
Subject: Re: UDF Configure method not getting called

Or alternatively, is there a way to pass configuration without using the configure method?

The configuration to the UDF is essentially a list of parameters that tells the UDF, what
it should morph into this time and what kind of work it should perform. If there is an all
encompassing way to do that, then I can modify the UDF to run irrespective if its run locally
or with MapRed context.

On Tue, Aug 25, 2015 at 2:44 PM, Rahul Sharma <kippy.pie@gmail.com<mailto:kippy.pie@gmail.com>>
wrote:
Oh thanks for the reply, Jason. That was my suspicion too.

The UDF in our case is not a function per say in pure mathematical sense of the word 'function'.
That is because, it doesn't take in a value and give out another value. It has side effects,
that form input for another MapReduce job. The point of doing it this way is that we wanted
to make use of the parallelism that would be afforded by running it as a map reduce job via
hive, as the processing is fairly compute extensive.

Is there a way to force map-reduce jobs? I think hive.fetch.task.conversion to minimal might
help, is there anything that can be done?

Thanks a ton.

On Tue, Aug 25, 2015 at 2:36 PM, Jason Dere <jdere@hortonworks.com<mailto:jdere@hortonworks.com>>
wrote:

?There might be a few cases where a UDF is executed locally and not as part of a Map/Reduce
job?:

 - Hive might choose not to run a M/R task for your query (see hive.fetch.task.conversion)

 - If the UDF is deterministic and has deterministic inputs, Hive might decide to run the
UDF once to get the value and use constant folding to replace calls of that UDF with the value
from the one UDF call (see hive.optimize.constant.propagation?)


Taking a look at the explain plan for you query might confirm this. In those cases the UDF
would not run within a M/R task and configure() would not be called.



________________________________
From: Rahul Sharma <kippy.pie@gmail.com<mailto:kippy.pie@gmail.com>>
Sent: Tuesday, August 25, 2015 11:32 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: UDF Configure method not getting called

Hi Guys,

We have a UDF which extends GenericUDF and does some configuration within the public void
configure(MapredContext ctx) method.

MapredContext in configure method gives access to the HiveConfiguration via JobConf, which
contains custom attributes of the form xy.abc.something. Reading these values is required
for the semantics of the UDF.

Everything works fine till Hive 0.13, however with Hive 0.14 (or 1.0) the configure method
of the UDF is never called by the runtime and hence the UDF cannot configure itself dynamically.

Is this the intended behavior? If so, what is the new way to read configuration of the Map
Reduce Job within the UDF?

I would be grateful for any help.



Mime
View raw message