drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Some questions on UDFs
Date Sun, 05 Jul 2015 20:36:20 GMT
Uh... actually, I think that it isn't obvious because there is absolutely
no documentation and there are no comments in the code.

And what should the JIRA say?  We can't even tell what's missing, if
anything, because we can't tell how it is supposed to work.




On Sun, Jul 5, 2015 at 11:50 AM, Jacques Nadeau <jacques@apache.org> wrote:

> It isn't obvious because you shouldn't do it.  Please file a JIRA to add
> real support for this type of output.
>
> Your current function would leak large amounts of memory that would
> ultimately crash the node.
>
> Realistically, there are very few internal Drill APIs that you should
> access via a UDF (injectables, holders, complexwriter, fieldreader and
> helpers).  A post 1.0 goal was to provide a UDF interface JAR to ensure
> people don't accidentally reach into Drill's internals.  (A later
> possibility is bytecode weaving to completely protect against it).
>
> J
>
> On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > That was impressively non-obvious.
> >
> >
> >
> > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jbates@maprtech.com> wrote:
> >
> > > I did get a new RepeatedBigIntHolder built and added a BigIntVector
> added
> > > to it. I'll try it in the UDF tomorrow and see if there is a difference
> > in
> > > the ways I found to get a BufferAllocator.
> > >
> > > .
> > > .
> > > .
> > > @Inject DrillBuf buffer;
> > > @Workspace RepeatedBigIntHolder yList;
> > > .
> > > .
> > > .
> > > @Override
> > > public void setup() {
> > > .
> > > .
> > > .
> > > //org.apache.drill.exec.memory.BufferAllocator allocator =
> > > buffer.getAllocator();
> > > org.apache.drill.exec.memory.BufferAllocator allocator =  new
> > > org.apache.drill.exec.memory.TopLevelAllocator();
> > > yList = new RepeatedBigIntHolder();
> > > yList.vector = new
> > >
> > >
> >
> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
> > >
> > >
> >
> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
> > >
> > >
> >
> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
> > > allocator);
> > > .
> > > .
> > > .
> > > }
> > >
> > >
> > >
> > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jbates@maprtech.com> wrote:
> > >
> > > > I still have issues finding the correct way to create and use a
> > > > RepeatedHolder and Writers are a non starter for Workspace values. I
> > can
> > > > make do with creating a concatenated string in a VarCharHolder for
> > small
> > > > data sets to get past this in the short term and finish testing the
> > > output
> > > > values I expect but won't be able to do any scale till I figure out
> how
> > > to
> > > > make a repeated list.
> > > >
> > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jbates@maprtech.com>
> wrote:
> > > >
> > > >> Well... Converting from string to integers anyway... To many 4th of
> > July
> > > >> Hot Dogs. going into nitrate overload. :)
> > > >>
> > > >> I am pulling an array of string values from json data. The string
> > values
> > > >> are actually integers. I am converting to integers and summing each
> > > >> array entry to the final tally.
> > > >>
> > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jbates@maprtech.com>
> > wrote:
> > > >>
> > > >>> Ted,
> > > >>>
> > > >>> Yes, I started out just getting a basic count to work. I am trying
> to
> > > >>> keep the workflow as close to a basic user as possible. As such,
I
> am
> > > >>> building and using the MapR Apache Drill sandbox to test.
> > > >>>
> > > >>>
> > > >>>    1. Always look at the drillbits.log file to see if drill had
any
> > > >>>    issues loading your UDF. That was where I learned that all
> > > workspace values
> > > >>>    needed to be holders
> > > >>>       -
> > > >>>       - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure
> loading
> > > >>>       function class
> > > >>>
> > >  com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
> > field
> > > >>>       xList. Aggregate function 'MyLinearRegression1' workspace
> > > variable 'xList'
> > > >>>       is of type 'interface
> > > >>>
> > >  org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
> > > >>>       Please change it to Holder type.
> > > >>>    2. Error messages:
> > > >>>       - If you get an error in this format it means that Drill
can
> > not
> > > >>>       find your function so it probably didn't load it. back to
> step
> > 1:
> > > >>>          -
> > > >>>          - PARSE ERROR: From line 1, column 8 to line 1, column
44:
> > No
> > > >>>          match found for function signature MyFunctionName(<ANY>)
> > > >>>       - If you get an error in this format it means that the
> function
> > > >>>       is there but Drill could not find a signature that matched
> the
> > > param types
> > > >>>       or param numbers you were passing it. The exact wording
will
> > > change but
> > > >>>       the Missing function implementation is the key phrase to
look
> > > for:
> > > >>>          -
> > > >>>          - Error: SYSTEM ERROR:
> > > >>>          org.apache.drill.exec.exception.SchemaChangeException:
> > > Failure while trying
> > > >>>          to materialize incoming schema.  Errors:
> > > >>>          - Error in expression at index -1.  Error: Missing
> function
> > > >>>          implementation: [castBIGINT(VARCHAR-REPEATED)].  Full
> > > expression: --UNKNOWN
> > > >>>          EXPRESSION--
> > > >>>       3. In your function definition for aggregate functions you
> need
> > > >>>    to set null processing to internal and your isRandom to false.
> > > Example
> > > >>>    below:
> > > >>>       -
> > > >>>       - @FunctionTemplate(name = "MyFunctionName", scope =
> > > >>>       FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
> > > >>>       FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
> > > >>>       isBinaryCommutative = false, costCategory =
> > > >>>       FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > >>>
> > > >>> Below is an example from the Apache Drill tutorial data sets
> > contained
> > > >>> in the MapR Apache Drill sandbox. I am pulling an array if string
> > > values
> > > >>> from json data. The string values are actually integers. I am
> > > converting to
> > > >>> string and summing each array entry to the final tally. This in
no
> > way
> > > >>> represents what this data was for but it did become a handy way
for
> > me
> > > to
> > > >>> peck out the "correct" way to build an aggregation UDF function
> > > >>>
> > > >>> @FunctionTemplate(name = "MyArraySum", scope =
> > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
> > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
> > > >>> isBinaryCommutative = false, costCategory =
> > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > >>> public static class MyArraySum implements DrillAggFunc {
> > > >>>
> > > >>> @Param RepeatedVarCharHolder listToSearch;
> > > >>> @Workspace NullableBigIntHolder count;
> > > >>> @Workspace NullableBigIntHolder sum;
> > > >>> @Workspace NullableVarCharHolder vc;
> > > >>> @Output BigIntHolder out;
> > > >>>
> > > >>> @Override
> > > >>> public void setup() {
> > > >>> count.value=0;
> > > >>> sum.value = 0;
> > > >>> }
> > > >>>
> > > >>> @Override
> > > >>> public void add() {
> > > >>> int c = listToSearch.end - listToSearch.start;
> > > >>> int val = 0;
> > > >>> try {
> > > >>> for(int i=0; i<c; i++){
> > > >>> listToSearch.vector.getAccessor().get(i, vc);
> > > >>> String inputStr =
> > > >>>
> > >
> >
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
> > > >>> vc.end, vc.buffer);
> > > >>> val = Integer.parseInt(inputStr);
> > > >>> sum.value = sum.value + val;
> > > >>> }
> > > >>> } catch (Exception e) {
> > > >>> val = 0;
> > > >>> }
> > > >>> count.value = count.value + 1;
> > > >>> }
> > > >>>
> > > >>> Example select statement:
> > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id
as
> > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t
limit
> > 5);
> > > >>>
> > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <ted.dunning@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>>> Jim,
> > > >>>>
> > > >>>> I think that you may be having trouble with aggregators in
> general.
> > > >>>>
> > > >>>> Have you been able to build *any* aggregator of anything?
 I
> > haven't.
> > > >>>>
> > > >>>> When I try to build an aggregator of int's or doubles, I get
a
> very
> > > >>>> persistent problem with Drill even seeing my aggregates:
> > > >>>>
> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from
> > > >>>> cp.`employee.json`;*
> > > >>>>
> > > >>>> Jul 04, 2015 4:19:35 PM
> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
> > > >>>>
> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
No
> > > match
> > > >>>> found for function signature sum_int(<ANY>)
> > > >>>>
> > > >>>> Jul 04, 2015 4:19:35 PM
> org.apache.calcite.runtime.CalciteException
> > > >>>> <init>
> > > >>>>
> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
From
> > line
> > > 1,
> > > >>>> column 8 to line 1, column 27: No match found for function
> signature
> > > >>>> sum_int(<ANY>)
> > > >>>>
> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column
27:
> No
> > > >>>> match
> > > >>>> found for function signature sum_int(<ANY>)*
> > > >>>>
> > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on
> 10.0.1.2:31010
> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > >>>>
> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id
as int))
> > from
> > > >>>> cp.`employee.json`*;
> > > >>>>
> > > >>>> Jul 04, 2015 4:19:45 PM
> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
> > > >>>>
> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
No
> > > match
> > > >>>> found for function signature sum_int(<NUMERIC>)
> > > >>>>
> > > >>>> Jul 04, 2015 4:19:45 PM
> org.apache.calcite.runtime.CalciteException
> > > >>>> <init>
> > > >>>>
> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
From
> > line
> > > 1,
> > > >>>> column 8 to line 1, column 40: No match found for function
> signature
> > > >>>> sum_int(<NUMERIC>)
> > > >>>>
> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column
40:
> No
> > > >>>> match
> > > >>>> found for function signature sum_int(<NUMERIC>)*
> > > >>>>
> > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on
> 10.0.1.2:31010
> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > >>>>
> > > >>>> 0: jdbc:drill:zk=local>
> > > >>>>
> > > >>>>
> > > >>>> It looks like there is some undocumented subtlety about how
to
> > > register
> > > >>>> an
> > > >>>> aggregator.
> > > >>>>
> > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <jbates@maprtech.com>
> > > wrote:
> > > >>>>
> > > >>>> > I'm working on the same thing. I want to aggregate a
list of
> > values.
> > > >>>> It has
> > > >>>> > been a search and guess game for the most part. I'm still
stuck
> in
> > > the
> > > >>>> > process of getting the values all into a list. The writers
look
> > > >>>> interesting
> > > >>>> > but for aggregation functions  it looks like the input
is the
> > param
> > > >>>> and
> > > >>>> > output objects can't hold the aggregations steps. The
Workspace
> is
> > > >>>> where
> > > >>>> > that happens. If I try and use a Writer in a workspace
it won't
> > load
> > > >>>> and
> > > >>>> > tells me to change it to Holders which was why I was
using them
> to
> > > >>>> start
> > > >>>> > with. Maybe I'm missing the architecture of the agg function.
It
> > > >>>> looked
> > > >>>> > like it was....
> > > >>>> >
> > > >>>> > @Param comes in -> initialize @Workspace vars in setup
->
> process
> > > data
> > > >>>> > through @Workspace vars in add -> finalize @Output
in output.
> > > >>>> >
> > > >>>> > So I'm back to trying to figure out how to create a
> > > >>>> RepeatedBigIntHolder or
> > > >>>> > a RepeatedVarCharHolder...
> > > >>>> >
> > > >>>> >
> > > >>>> >
> > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > >>>> wrote:
> > > >>>> >
> > > >>>> > > I am working on trying to build any kind of list
constructing
> > > >>>> aggregator
> > > >>>> > > and having absolute fits.
> > > >>>> > >
> > > >>>> > > To simplify life, I decided to just build a generic
list
> builder
> > > >>>> that is
> > > >>>> > a
> > > >>>> > > scalar function that returns a list containing its
argument.
> > Thus
> > > >>>> > zoop(3)
> > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3])
=> [[1,2,3]].
> > > >>>> > >
> > > >>>> > > The ComplexWriter looks like the place to go. As
usual, the
> > > >>>> complete lack
> > > >>>> > > of comments in most of Drill makes this very hard
since I have
> > to
> > > >>>> guess
> > > >>>> > > what works and what doesn't.
> > > >>>> > >
> > > >>>> > > In my code, I note that ComplexWriter has a nice
rootAsList()
> > > >>>> method.  I
> > > >>>> > > used this in zip and it works nicely to construct
lists for
> > > >>>> output.  I
> > > >>>> > note
> > > >>>> > > that the resulting ListWriter has a method
> > copyReader(FieldReader
> > > >>>> var1)
> > > >>>> > > which looks really good.
> > > >>>> > >
> > > >>>> > > Unfortunately, the only implementation of copyReader()
is in
> > > >>>> > > AbstractFieldWriter and it looks this:
> > > >>>> > >
> > > >>>> > > public void copyReader(FieldReader reader) {
> > > >>>> > >     this.fail("Copy FieldReader");
> > > >>>> > > }
> > > >>>> > >
> > > >>>> > > I would like to formally say at this point "WTF"?
> > > >>>> > >
> > > >>>> > > In digging in further, I see other methods that
look handy
> like
> > > >>>> > >
> > > >>>> > > public void write(IntHolder holder) {
> > > >>>> > >     this.fail("Int");
> > > >>>> > > }
> > > >>>> > >
> > > >>>> > > And then in looking at implementations, it looks
like there
> is a
> > > >>>> > > combinatorial explosion because every type seems
to need a
> write
> > > >>>> method
> > > >>>> > for
> > > >>>> > > every other type.
> > > >>>> > >
> > > >>>> > > What is the thought here?  How can I copy an arbitrary
value
> > into
> > > a
> > > >>>> list?
> > > >>>> > >
> > > >>>> > > My next thought was to build code that dispatches
on type.
> > There
> > > >>>> is a
> > > >>>> > > method called getType() on the FieldReader.  Unfortunately,
> that
> > > >>>> drives
> > > >>>> > > into code generated by protoc and I see no way to
dispatch on
> > the
> > > >>>> type of
> > > >>>> > > an incoming value.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > How is this supposed to work?
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <
> > > baid.mehant@gmail.com>
> > > >>>> > wrote:
> > > >>>> > >
> > > >>>> > > > For a detailed example on using ComplexWriter
interface you
> > can
> > > >>>> take a
> > > >>>> > > look
> > > >>>> > > > at the Mappify
> > > >>>> > > > <
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > > >>>> > > > >
> > > >>>> > > > (kvgen) function. The function itself is very
simple however
> > it
> > > >>>> makes
> > > >>>> > use
> > > >>>> > > > of the utility methods in MappifyUtility
> > > >>>> > > > <
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > > >>>> > > > >
> > > >>>> > > > and MapUtility
> > > >>>> > > > <
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > > >>>> > > > >
> > > >>>> > > > which perform most of the work.
> > > >>>> > > >
> > > >>>> > > > Currently we don't have a generic infrastructure
to handle
> > > errors
> > > >>>> > coming
> > > >>>> > > > out of functions. However there is UserException,
which when
> > > >>>> raised
> > > >>>> > will
> > > >>>> > > > make sure that Drill does not gobble up the
error message in
> > > that
> > > >>>> > > > exception. So you can probably throw a UserException
with
> the
> > > >>>> failing
> > > >>>> > > input
> > > >>>> > > > in your function to make sure it propagates
to the user.
> > > >>>> > > >
> > > >>>> > > > Thanks
> > > >>>> > > > Mehant
> > > >>>> > > >
> > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau
<
> > > >>>> jacques@apache.org>
> > > >>>> > > wrote:
> > > >>>> > > >
> > > >>>> > > > > *Holders are for both input and output.
 You can also use
> > > >>>> > CompleWriter
> > > >>>> > > > for
> > > >>>> > > > > output and FieldReader for input if you
want to write or
> > read
> > > a
> > > >>>> > complex
> > > >>>> > > > > value.
> > > >>>> > > > >
> > > >>>> > > > > I don't think we've provided a really
clean way to
> > construct a
> > > >>>> > > > > Repeated*Holder for output purposes. 
You can probably do
> it
> > > by
> > > >>>> > > reaching
> > > >>>> > > > > into a bunch of internal interfaces in
Drill.  However, I
> > > would
> > > >>>> > > recommend
> > > >>>> > > > > using the ComplexWriter output pattern
for now.  This will
> > be
> > > a
> > > >>>> > little
> > > >>>> > > > less
> > > >>>> > > > > efficient but substantially less brittle.
 I suggest you
> > open
> > > >>>> up a
> > > >>>> > jira
> > > >>>> > > > for
> > > >>>> > > > > using a Repeated*Holder as an output.
> > > >>>> > > > >
> > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning
<
> > > >>>> ted.dunning@gmail.com>
> > > >>>> > > > wrote:
> > > >>>> > > > >
> > > >>>> > > > > > Holders are for input, I think.
> > > >>>> > > > > >
> > > >>>> > > > > > Try the different kinds of writers.
> > > >>>> > > > > >
> > > >>>> > > > > >
> > > >>>> > > > > >
> > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM,
Jim Bates <
> > > >>>> jbates@maprtech.com>
> > > >>>> > > > wrote:
> > > >>>> > > > > >
> > > >>>> > > > > > > Using a repeatedholder as a
@param I've got working. I
> > was
> > > >>>> > working
> > > >>>> > > > on a
> > > >>>> > > > > > > custom aggregator function using
DrillAggFunc. In
> this I
> > > >>>> can do
> > > >>>> > > > simple
> > > >>>> > > > > > > things but If I want to build
a list values and do
> > > >>>> something with
> > > >>>> > > it
> > > >>>> > > > in
> > > >>>> > > > > > the
> > > >>>> > > > > > > final output method I think
I need to use
> > RepeatedHolders
> > > >>>> in the
> > > >>>> > > > > > > @Workspace. To do that I need
to create a new one in
> the
> > > >>>> setup
> > > >>>> > > > method.
> > > >>>> > > > > I
> > > >>>> > > > > > > can't get one built. They all
require a
> BufferAllocator
> > to
> > > >>>> be
> > > >>>> > > passed
> > > >>>> > > > in
> > > >>>> > > > > > to
> > > >>>> > > > > > > build it. I have not found a
way to get an allocator
> > yet.
> > > >>>> Any
> > > >>>> > > > > > suggestions?
> > > >>>> > > > > > >
> > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37
PM, Ted Dunning <
> > > >>>> > ted.dunning@gmail.com
> > > >>>> > > >
> > > >>>> > > > > > wrote:
> > > >>>> > > > > > >
> > > >>>> > > > > > > > If you look at the zip
function in
> > > >>>> > > > > > > >
> https://github.com/mapr-demos/simple-drill-functions
> > > you
> > > >>>> can
> > > >>>> > > have
> > > >>>> > > > an
> > > >>>> > > > > > > > example of building a structure.
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > The basic idea is that
your output is denoted as
> > > >>>> > > > > > > >
> > > >>>> > > > > > > >         @Output
> > > >>>> > > > > > > >         BaseWriter.ComplexWriter
writer;
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > The pattern for building
a list of lists of integers
> > is
> > > >>>> like
> > > >>>> > > this:
> > > >>>> > > > > > > >
> > > >>>> > > > > > > >         writer.setValueCount(n);
> > > >>>> > > > > > > >         ...
> > > >>>> > > > > > > >         BaseWriter.ListWriter
outer =
> > > writer.rootAsList();
> > > >>>> > > > > > > >         outer.start();
// [ outer list
> > > >>>> > > > > > > >         ...
> > > >>>> > > > > > > >         // for each inner
list
> > > >>>> > > > > > > >             BaseWriter.ListWriter
inner =
> > outer.list();
> > > >>>> > > > > > > >             inner.start();
> > > >>>> > > > > > > >             // for each
inner list element
> > > >>>> > > > > > > >
> > >  inner.integer().writeInt(accessor.get(i));
> > > >>>> > > > > > > >             }
> > > >>>> > > > > > > >             inner.end();
  // ] inner list
> > > >>>> > > > > > > >         }
> > > >>>> > > > > > > >         outer.end(); //
] outer list
> > > >>>> > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > On Sat, Jul 4, 2015 at
10:29 AM, Jim Bates <
> > > >>>> > jbates@maprtech.com>
> > > >>>> > > > > > wrote:
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > > I have working aggregation
and simple UDFs. I've
> > been
> > > >>>> trying
> > > >>>> > to
> > > >>>> > > > > > > document
> > > >>>> > > > > > > > > and understand each
of the options available in a
> > > Drill
> > > >>>> UDF.
> > > >>>> > > > > > > > Understanding
> > > >>>> > > > > > > > > the different FunctionScope's,
the ones that are
> > > >>>> allowed, the
> > > >>>> > > > ones
> > > >>>> > > > > > that
> > > >>>> > > > > > > > are
> > > >>>> > > > > > > > > not. The impact of
different cost categories. The
> > > >>>> different
> > > >>>> > > > steps
> > > >>>> > > > > > > needed
> > > >>>> > > > > > > > > to understand handling
any of the supported data
> > types
> > > >>>> and
> > > >>>> > > > > > structures
> > > >>>> > > > > > > in
> > > >>>> > > > > > > > > drill.
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > Here are a few of
my current road blocks. Any
> > pointers
> > > >>>> would
> > > >>>> > be
> > > >>>> > > > > > greatly
> > > >>>> > > > > > > > > appreciated.
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >    1. I've been trying
to understand how to
> > correctly
> > > >>>> use
> > > >>>> > > > > > > RepeatedHolders
> > > >>>> > > > > > > > >    of whatever type.
For this discussion lets
> start
> > > >>>> with a
> > > >>>> > > > > > > > >    RepeatedBigIntHolder.
I'm trying to figure out
> > the
> > > >>>> best
> > > >>>> > way
> > > >>>> > > to
> > > >>>> > > > > > > create
> > > >>>> > > > > > > > a
> > > >>>> > > > > > > > > new
> > > >>>> > > > > > > > >    one. I have not
figured out where in the
> existing
> > > >>>> drill
> > > >>>> > code
> > > >>>> > > > > > someone
> > > >>>> > > > > > > > > does
> > > >>>> > > > > > > > >    this. If I use
a  RepeatedBigIntHolder as a
> > > Workspace
> > > >>>> > object
> > > >>>> > > > is
> > > >>>> > > > > is
> > > >>>> > > > > > > > null
> > > >>>> > > > > > > > > to
> > > >>>> > > > > > > > >    start with. I created
a new one in the startup
> > > >>>> section of
> > > >>>> > > the
> > > >>>> > > > > udf
> > > >>>> > > > > > > but
> > > >>>> > > > > > > > > the
> > > >>>> > > > > > > > >    vector was null.
I can find no reference in
> > > creating
> > > >>>> a new
> > > >>>> > > > > > > > BigIntVector.
> > > >>>> > > > > > > > >    There is a way
to create a BigIntVector and I
> did
> > > >>>> find an
> > > >>>> > > > > example
> > > >>>> > > > > > of
> > > >>>> > > > > > > > >    creating a new
VarCharVector but I can't do
> that
> > > >>>> using the
> > > >>>> > > > drill
> > > >>>> > > > > > jar
> > > >>>> > > > > > > > > files
> > > >>>> > > > > > > > >    from 1.0. The
> > > >>>> org.apache.drill.common.types.TypeProtos and
> > > >>>> > > > > > > > >    the
> > > >>>> org.apache.drill.common.types.TypeProtos.MinorType
> > > >>>> > > classes
> > > >>>> > > > > do
> > > >>>> > > > > > > not
> > > >>>> > > > > > > > >    appear to be accessible
from the drill jar
> files.
> > > >>>> > > > > > > > >    2. What is the
best way to close out a UDF in
> the
> > > >>>> event it
> > > >>>> > > > > > generates
> > > >>>> > > > > > > > an
> > > >>>> > > > > > > > >    exception? Are
there specific steps one should
> > > >>>> follow to
> > > >>>> > > make
> > > >>>> > > > a
> > > >>>> > > > > > > clean
> > > >>>> > > > > > > > > exit
> > > >>>> > > > > > > > >    in a catch block
that are beneficial to Drill?
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message