hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tanton Gibbs" <tanton.gi...@gmail.com>
Subject Re: UDFs and types
Date Tue, 08 Jul 2008 03:07:45 GMT
What about using annotations for this?

Could we create an annotation say @UDF that allowed us to specify an
input schema?

I imagine you could put quite a bit of information into the annotation
such as function name, input args, return type, etc...

On Wed, Jul 2, 2008 at 3:43 PM, Alan Gates <gates@yahoo-inc.com> wrote:
> With the introduction of types (see
> http://issues.apache.org/jira/browse/PIG-157) we need to decide how EvalFunc
> will interact with the types.  The original proposal was that the DEFINE
> keyword would be modified to allow specification of types for the UDF.  This
> has a couple of problems.  One, DEFINE is already used to specify
> constructor arguments.  Using it to also specify types will be confusing.
>  Two, it has been pointed out that this type information is a property of
> the UDF and should therefore be declared by the UDF, not in the script.
> Separately, as a way to allow simple function overloading, a change had been
> proposed to the EvalFunc interface to allow an EvalFunc to specify that for
> a given type, a different instance of EvalFunc should be used (see
> https://issues.apache.org/jira/browse/PIG-276).
> I would like to propose that we expand the changes in PIG-276 to be more
> general.  Rather than adding classForType() as proposed in PIG-276, EvalFunc
> will instead add a function:
> public Map<Schema, FuncSpec> getArgToFuncMapping() {
>   return null;
> }
> Where FuncSpec is a new class that contains the name of the class that
> implements the UDF along with any necessary arguments for the constructor.
> The type checker will then, as part of type checking LOUserFunc make a call
> to this function.  If it receives a null, it will simply leave the UDF as
> is, and make the assumption that the UDF can handle whatever datatype is
> being provided to it.  This will cover most existing UDFs, which will not
> override the default implementation.
> If a UDF wants to override the default, it should return a map that gives a
> FuncSpec for each type of schema that it can support.  For example, for the
> UDF concat, the map would have two entries:
> key: schema(chararray, chararray) value: StringConcat
> key: schema(bytearray, bytearray) value: ByteConcat
> The type checker will then take the schema of what is being passed to it and
> perform a lookup in the map.  If it finds an entry, it will use the
> associated FuncSpec.  If it does not, it will throw an exception saying that
> that EvalFunc cannot be used with those types.
> At this point, the type checker will make no effort to find a best fit
> function.  Either the fit is perfect, or it will not be done.  In the future
> we would like to modify the type checker to select a best fit.  For example,
> if a UDF says it can handle schema(long) and the type checker finds it has
> schema(int), it can insert a cast to deal with that.  But in the first pass
> we will ignore this and depend on the user to insert the casts.
> Thoughts?
> Alan.

View raw message