hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1016) Ability to access DistributedCache from UDFs
Date Mon, 30 Aug 2010 23:08:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904400#action_12904400
] 

Carl Steinbach commented on HIVE-1016:
--------------------------------------

@Namit: I initially preferred that approach too, and I think it would make sense if all of
the UDF
classes inherited from the same abstract base class. However, we have a bunch of unrelated
UDF base classes (UDF, UDAF, GenericUDF, GenericUDAFEvaluator (which already has a
runtime init() method), and GenericUDTF), and taking the approach you suggested would require
modifications to all of these classes as well as the code that calls them. I also think it's
likely that
we'll want to make more runtime context available to UDFs in the future, and it's easier to
proxy
this through the UDFContext singleton than to keep adding methods to each of the different
UDF
base classes.

> Ability to access DistributedCache from UDFs
> --------------------------------------------
>
>                 Key: HIVE-1016
>                 URL: https://issues.apache.org/jira/browse/HIVE-1016
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>         Attachments: HIVE-1016.1.patch.txt
>
>
> There have been several requests on the mailing list for
> information about how to access the DistributedCache from UDFs, e.g.:
> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01650.html
> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01926.html
> While responses to these emails suggested several workarounds, the only correct
> way of accessing the distributed cache is via the static methods of Hadoop's
> DistributedCache class, and all of these methods require that the JobConf be passed
> in as a parameter. Hence, giving UDFs access to the distributed cache
> reduces to giving UDFs access to the JobConf.
> I propose the following changes to GenericUDF/UDAF/UDTF:
> * Add an exec_init(Configuration conf) method that is called during Operator initialization
at runtime.
> * Change the name of the "initialize" method to "compile_init" to make it clear that
this method is called at compile-time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message