hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jipengzeng@meilishuo.com" <jipengz...@meilishuo.com>
Subject Re: Discussion: permanent UDF with database name
Date Fri, 18 Dec 2015 03:07:10 GMT
@ Furcy Pin
I agree you idea!
when i found after hive-0.13,user can define permanent UDF.but it must bind with database
name.
so if we want to use the udf without database name,we must create it at all of the databases
name.
it take another problem,when we create a new databases.we need get all of the udfs that we
have been defined.
then create them one by one.
This is the biggest problem I have encountered in the use of.

jipengzeng



 
From: Furcy Pin
Date: 2015-12-17 20:14
To: user
Subject: Discussion: permanent UDF with database name
Hi Hive users,

I would like to pursue the discussion that happened during the design of the feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has been implemented,
some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find it really annoying
not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive functions with the
company's user-defined functions, either because some useful generic function are missing
in the built-in functions or to add business-specific functions.

In the latter case, I understand very well the necessity of qualifying them with a business-specific
database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist yet on the Hive
version that we were currently running. To use it, all we had to do was take the UDF's source
code from a more recent version of Hive, built it in a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a "database_name.function_name",
and use it as is. Hence, when upgrading we need to rename everywhere "database_name.function_name"
with "function_name".

This is just an example, but I would like to emphasize the point that sometimes we want to
create permanent UDFs that are as global as built-in UDFs and not bother if it is a built-in
or user-defined function. As someone pointed out in HIVE-6167's discussion, imagine if all
the built-in UDFs had to be called with "sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat them?

Maybe it would make sense to create a feature request for being able to specify a GLOBAL keyword
when creating a permanent UDF, when we really want it to be global?

What do you think?

Regards,

Furcy

Mime
View raw message