hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: UDFs and types
Date Thu, 03 Jul 2008 11:55:01 GMT
+1 Agree.

I will try to make "best fit" happen in 24 hours after you commit the new
UDF design.


On Thu, Jul 3, 2008 at 6:55 AM, Olga Natkovich <olgan@yahoo-inc.com> wrote:

> Sounds good to me.
>
> Olga
>
> > -----Original Message-----
> > From: Alan Gates [mailto:gates@yahoo-inc.com]
> > Sent: Wednesday, July 02, 2008 1:44 PM
> > To: pig-dev@incubator.apache.org
> > Subject: UDFs and types
> >
> > With the introduction of types (see
> > http://issues.apache.org/jira/browse/PIG-157) we need to
> > decide how EvalFunc will interact with the types.  The
> > original proposal was that the DEFINE keyword would be
> > modified to allow specification of types for the UDF.  This
> > has a couple of problems.  One, DEFINE is already used to
> > specify constructor arguments.  Using it to also specify
> > types will be confusing.  Two, it has been pointed out that
> > this type information is a property of the UDF and should
> > therefore be declared by the UDF, not in the script.
> >
> > Separately, as a way to allow simple function overloading, a
> > change had been proposed to the EvalFunc interface to allow
> > an EvalFunc to specify that for a given type, a different
> > instance of EvalFunc should be used (see
> > https://issues.apache.org/jira/browse/PIG-276).
> >
> > I would like to propose that we expand the changes in PIG-276
> > to be more general.  Rather than adding classForType() as
> > proposed in PIG-276, EvalFunc will instead add a function:
> >
> > public Map<Schema, FuncSpec> getArgToFuncMapping() {
> >     return null;
> > }
> >
> > Where FuncSpec is a new class that contains the name of the
> > class that implements the UDF along with any necessary
> > arguments for the constructor.
> >
> > The type checker will then, as part of type checking
> > LOUserFunc make a call to this function.  If it receives a
> > null, it will simply leave the UDF as is, and make the
> > assumption that the UDF can handle whatever datatype is being
> > provided to it.  This will cover most existing UDFs, which
> > will not override the default implementation.
> >
> > If a UDF wants to override the default, it should return a
> > map that gives a FuncSpec for each type of schema that it can
> > support.  For example, for the UDF concat, the map would have
> > two entries:
> > key: schema(chararray, chararray) value: StringConcat
> > key: schema(bytearray, bytearray) value: ByteConcat
> >
> > The type checker will then take the schema of what is being
> > passed to it and perform a lookup in the map.  If it finds an
> > entry, it will use the associated FuncSpec.  If it does not,
> > it will throw an exception saying that that EvalFunc cannot
> > be used with those types.
> >
> > At this point, the type checker will make no effort to find a
> > best fit function.  Either the fit is perfect, or it will not
> > be done.  In the future we would like to modify the type
> > checker to select a best fit.
> > For example, if a UDF says it can handle schema(long) and the
> > type checker finds it has schema(int), it can insert a cast
> > to deal with that.  But in the first pass we will ignore this
> > and depend on the user to insert the casts.
> >
> > Thoughts?
> >
> > Alan.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message