hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1016) Ability to access DistributedCache from UDFs
Date Mon, 28 Dec 2009 20:35:29 GMT
Ability to access DistributedCache from UDFs
--------------------------------------------

                 Key: HIVE-1016
                 URL: https://issues.apache.org/jira/browse/HIVE-1016
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Carl Steinbach
            Assignee: Carl Steinbach


There have been several requests on the mailing list for
information about how to access the DistributedCache from UDFs, e.g.:

http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01650.html
http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01926.html

While responses to these emails suggested several workarounds, the only correct
way of accessing the distributed cache is via the static methods of Hadoop's
DistributedCache class, and all of these methods require that the JobConf be passed
in as a parameter. Hence, giving UDFs access to the distributed cache
reduces to giving UDFs access to the JobConf.

I propose the following changes to GenericUDF/UDAF/UDTF:

* Add an exec_init(Configuration conf) method that is called during Operator initialization
at runtime.
* Change the name of the "initialize" method to "compile_init" to make it clear that this
method is called at compile-time.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message