hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Losinger <>
Subject Q: UDFs & Threading
Date Thu, 17 Sep 2015 16:16:59 GMT

I'm writing some Hive UDFs, using JNI to talk to a native C library. The C library requires
some expensive initialization, and maintains its internal state via a handle. To avoid re-initializing
this library at every row, I initialize the library on the first row, then store the handle
as a static variable in the Java world and fetch that for subsequent rows. This is all working

The tough part is that the library also requires the caller to do cleanup, to release that
internal state. Being Java, there are no destructors, of course. And I can't rely on 'finalize'.
So I can't figure out where to clean up this library.

Q 1: Is there anything in the Hive + UDF world that will tell my Java code when the query
is finished, so that I can cleanup that library? Or, is there any Java mechanism that I can
use to do this?

I'm using the 'UDF' class not 'GenericUDF', but I don't think that matters. I don't see anything
in either that looks like a cleanup, and GenericUDF's 'close' doesn't ever get called, AFAICT.

Q 2: Because I'm storing the library's internal state handle as a static variable in the Java
code, it would be available to any threads that use the Java code. That would be a problem.
So, my question is: Will a single UDF instance ever be accessed by more than one thread ?
In other words, are UDFs thread-safe ? Even if the query contains multiple UDF calls ? I need
to know if my assumption about being able to store this C-library's state as a Java 'static'
is a safe assumption or not.

Thanks in advance


View raw message