drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Bates <jba...@maprtech.com>
Subject Re: Some questions on UDFs
Date Sat, 04 Jul 2015 23:08:15 GMT
I'm working on the same thing. I want to aggregate a list of values. It has
been a search and guess game for the most part. I'm still stuck in the
process of getting the values all into a list. The writers look interesting
but for aggregation functions  it looks like the input is the param and
output objects can't hold the aggregations steps. The Workspace is where
that happens. If I try and use a Writer in a workspace it won't load and
tells me to change it to Holders which was why I was using them to start
with. Maybe I'm missing the architecture of the agg function. It looked
like it was....

@Param comes in -> initialize @Workspace vars in setup -> process data
through @Workspace vars in add -> finalize @Output in output.

So I'm back to trying to figure out how to create a RepeatedBigIntHolder or
a RepeatedVarCharHolder...



On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I am working on trying to build any kind of list constructing aggregator
> and having absolute fits.
>
> To simplify life, I decided to just build a generic list builder that is a
> scalar function that returns a list containing its argument.  Thus zoop(3)
> => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]].
>
> The ComplexWriter looks like the place to go. As usual, the complete lack
> of comments in most of Drill makes this very hard since I have to guess
> what works and what doesn't.
>
> In my code, I note that ComplexWriter has a nice rootAsList() method.  I
> used this in zip and it works nicely to construct lists for output.  I note
> that the resulting ListWriter has a method copyReader(FieldReader var1)
> which looks really good.
>
> Unfortunately, the only implementation of copyReader() is in
> AbstractFieldWriter and it looks this:
>
> public void copyReader(FieldReader reader) {
>     this.fail("Copy FieldReader");
> }
>
> I would like to formally say at this point "WTF"?
>
> In digging in further, I see other methods that look handy like
>
> public void write(IntHolder holder) {
>     this.fail("Int");
> }
>
> And then in looking at implementations, it looks like there is a
> combinatorial explosion because every type seems to need a write method for
> every other type.
>
> What is the thought here?  How can I copy an arbitrary value into a list?
>
> My next thought was to build code that dispatches on type.  There is a
> method called getType() on the FieldReader.  Unfortunately, that drives
> into code generated by protoc and I see no way to dispatch on the type of
> an incoming value.
>
>
> How is this supposed to work?
>
>
>
>
> On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <baid.mehant@gmail.com> wrote:
>
> > For a detailed example on using ComplexWriter interface you can take a
> look
> > at the Mappify
> > <
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > >
> > (kvgen) function. The function itself is very simple however it makes use
> > of the utility methods in MappifyUtility
> > <
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > >
> > and MapUtility
> > <
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > >
> > which perform most of the work.
> >
> > Currently we don't have a generic infrastructure to handle errors coming
> > out of functions. However there is UserException, which when raised will
> > make sure that Drill does not gobble up the error message in that
> > exception. So you can probably throw a UserException with the failing
> input
> > in your function to make sure it propagates to the user.
> >
> > Thanks
> > Mehant
> >
> > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <jacques@apache.org>
> wrote:
> >
> > > *Holders are for both input and output.  You can also use CompleWriter
> > for
> > > output and FieldReader for input if you want to write or read a complex
> > > value.
> > >
> > > I don't think we've provided a really clean way to construct a
> > > Repeated*Holder for output purposes.  You can probably do it by
> reaching
> > > into a bunch of internal interfaces in Drill.  However, I would
> recommend
> > > using the ComplexWriter output pattern for now.  This will be a little
> > less
> > > efficient but substantially less brittle.  I suggest you open up a jira
> > for
> > > using a Repeated*Holder as an output.
> > >
> > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> > >
> > > > Holders are for input, I think.
> > > >
> > > > Try the different kinds of writers.
> > > >
> > > >
> > > >
> > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <jbates@maprtech.com>
> > wrote:
> > > >
> > > > > Using a repeatedholder as a @param I've got working. I was working
> > on a
> > > > > custom aggregator function using DrillAggFunc. In this I can do
> > simple
> > > > > things but If I want to build a list values and do something with
> it
> > in
> > > > the
> > > > > final output method I think I need to use RepeatedHolders in the
> > > > > @Workspace. To do that I need to create a new one in the setup
> > method.
> > > I
> > > > > can't get one built. They all require a BufferAllocator to be
> passed
> > in
> > > > to
> > > > > build it. I have not found a way to get an allocator yet. Any
> > > > suggestions?
> > > > >
> > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning <ted.dunning@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > If you look at the zip function in
> > > > > > https://github.com/mapr-demos/simple-drill-functions you can
> have
> > an
> > > > > > example of building a structure.
> > > > > >
> > > > > > The basic idea is that your output is denoted as
> > > > > >
> > > > > >         @Output
> > > > > >         BaseWriter.ComplexWriter writer;
> > > > > >
> > > > > > The pattern for building a list of lists of integers is like
> this:
> > > > > >
> > > > > >         writer.setValueCount(n);
> > > > > >         ...
> > > > > >         BaseWriter.ListWriter outer = writer.rootAsList();
> > > > > >         outer.start(); // [ outer list
> > > > > >         ...
> > > > > >         // for each inner list
> > > > > >             BaseWriter.ListWriter inner = outer.list();
> > > > > >             inner.start();
> > > > > >             // for each inner list element
> > > > > >                 inner.integer().writeInt(accessor.get(i));
> > > > > >             }
> > > > > >             inner.end();   // ] inner list
> > > > > >         }
> > > > > >         outer.end(); // ] outer list
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates <jbates@maprtech.com>
> > > > wrote:
> > > > > >
> > > > > > > I have working aggregation and simple UDFs. I've been trying
to
> > > > > document
> > > > > > > and understand each of the options available in a Drill
UDF.
> > > > > > Understanding
> > > > > > > the different FunctionScope's, the ones that are allowed,
the
> > ones
> > > > that
> > > > > > are
> > > > > > > not. The impact of different cost categories. The different
> > steps
> > > > > needed
> > > > > > > to understand handling any of the supported data types
 and
> > > > structures
> > > > > in
> > > > > > > drill.
> > > > > > >
> > > > > > > Here are a few of my current road blocks. Any pointers
would be
> > > > greatly
> > > > > > > appreciated.
> > > > > > >
> > > > > > >
> > > > > > >    1. I've been trying to understand how to correctly use
> > > > > RepeatedHolders
> > > > > > >    of whatever type. For this discussion lets start with
a
> > > > > > >    RepeatedBigIntHolder. I'm trying to figure out the best
way
> to
> > > > > create
> > > > > > a
> > > > > > > new
> > > > > > >    one. I have not figured out where in the existing drill
code
> > > > someone
> > > > > > > does
> > > > > > >    this. If I use a  RepeatedBigIntHolder as a Workspace
object
> > is
> > > is
> > > > > > null
> > > > > > > to
> > > > > > >    start with. I created a new one in the startup section
of
> the
> > > udf
> > > > > but
> > > > > > > the
> > > > > > >    vector was null. I can find no reference in creating
a new
> > > > > > BigIntVector.
> > > > > > >    There is a way to create a BigIntVector and I did find
an
> > > example
> > > > of
> > > > > > >    creating a new VarCharVector but I can't do that using
the
> > drill
> > > > jar
> > > > > > > files
> > > > > > >    from 1.0. The org.apache.drill.common.types.TypeProtos
and
> > > > > > >    the org.apache.drill.common.types.TypeProtos.MinorType
> classes
> > > do
> > > > > not
> > > > > > >    appear to be accessible from the drill jar files.
> > > > > > >    2. What is the best way to close out a UDF in the event
it
> > > > generates
> > > > > > an
> > > > > > >    exception? Are there specific steps one should follow
to
> make
> > a
> > > > > clean
> > > > > > > exit
> > > > > > >    in a catch block that are beneficial to Drill?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message