drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Some questions on UDFs
Date Sun, 05 Jul 2015 21:10:24 GMT
You're right.  You're off the beaten path. I think everyone here would love
to have more documentation and more comments. Of course, all of these take
time.

If you have time to volunteer to help improve these things, that would be
great.

With regards to the question about the jira, describe your use case and
what functionality you couldn't find or make work. The active developers on
the project can then do their best to help shape the Jira into better docs,
javadocs and/or new functionality as time allows.

On Jul 5, 2015 1:37 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:

> Uh... actually, I think that it isn't obvious because there is absolutely
> no documentation and there are no comments in the code.
>
> And what should the JIRA say?  We can't even tell what's missing, if
> anything, because we can't tell how it is supposed to work.
>
>
>
>
> On Sun, Jul 5, 2015 at 11:50 AM, Jacques Nadeau <jacques@apache.org>
> wrote:
>
> > It isn't obvious because you shouldn't do it.  Please file a JIRA to add
> > real support for this type of output.
> >
> > Your current function would leak large amounts of memory that would
> > ultimately crash the node.
> >
> > Realistically, there are very few internal Drill APIs that you should
> > access via a UDF (injectables, holders, complexwriter, fieldreader and
> > helpers).  A post 1.0 goal was to provide a UDF interface JAR to ensure
> > people don't accidentally reach into Drill's internals.  (A later
> > possibility is bytecode weaving to completely protect against it).
> >
> > J
> >
> > On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > That was impressively non-obvious.
> > >
> > >
> > >
> > > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jbates@maprtech.com> wrote:
> > >
> > > > I did get a new RepeatedBigIntHolder built and added a BigIntVector
> > added
> > > > to it. I'll try it in the UDF tomorrow and see if there is a
> difference
> > > in
> > > > the ways I found to get a BufferAllocator.
> > > >
> > > > .
> > > > .
> > > > .
> > > > @Inject DrillBuf buffer;
> > > > @Workspace RepeatedBigIntHolder yList;
> > > > .
> > > > .
> > > > .
> > > > @Override
> > > > public void setup() {
> > > > .
> > > > .
> > > > .
> > > > //org.apache.drill.exec.memory.BufferAllocator allocator =
> > > > buffer.getAllocator();
> > > > org.apache.drill.exec.memory.BufferAllocator allocator =  new
> > > > org.apache.drill.exec.memory.TopLevelAllocator();
> > > > yList = new RepeatedBigIntHolder();
> > > > yList.vector = new
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
> > > >
> > > >
> > >
> >
> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
> > > >
> > > >
> > >
> >
> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
> > > > allocator);
> > > > .
> > > > .
> > > > .
> > > > }
> > > >
> > > >
> > > >
> > > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jbates@maprtech.com>
> wrote:
> > > >
> > > > > I still have issues finding the correct way to create and use a
> > > > > RepeatedHolder and Writers are a non starter for Workspace values.
> I
> > > can
> > > > > make do with creating a concatenated string in a VarCharHolder for
> > > small
> > > > > data sets to get past this in the short term and finish testing the
> > > > output
> > > > > values I expect but won't be able to do any scale till I figure out
> > how
> > > > to
> > > > > make a repeated list.
> > > > >
> > > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jbates@maprtech.com>
> > wrote:
> > > > >
> > > > >> Well... Converting from string to integers anyway... To many
4th
> of
> > > July
> > > > >> Hot Dogs. going into nitrate overload. :)
> > > > >>
> > > > >> I am pulling an array of string values from json data. The string
> > > values
> > > > >> are actually integers. I am converting to integers and summing
> each
> > > > >> array entry to the final tally.
> > > > >>
> > > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jbates@maprtech.com>
> > > wrote:
> > > > >>
> > > > >>> Ted,
> > > > >>>
> > > > >>> Yes, I started out just getting a basic count to work. I
am
> trying
> > to
> > > > >>> keep the workflow as close to a basic user as possible. As
such,
> I
> > am
> > > > >>> building and using the MapR Apache Drill sandbox to test.
> > > > >>>
> > > > >>>
> > > > >>>    1. Always look at the drillbits.log file to see if drill
had
> any
> > > > >>>    issues loading your UDF. That was where I learned that
all
> > > > workspace values
> > > > >>>    needed to be holders
> > > > >>>       -
> > > > >>>       - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure
> > loading
> > > > >>>       function class
> > > > >>>
> > > >  com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
> > > field
> > > > >>>       xList. Aggregate function 'MyLinearRegression1' workspace
> > > > variable 'xList'
> > > > >>>       is of type 'interface
> > > > >>>
> > > >
> org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
> > > > >>>       Please change it to Holder type.
> > > > >>>    2. Error messages:
> > > > >>>       - If you get an error in this format it means that
Drill
> can
> > > not
> > > > >>>       find your function so it probably didn't load it. back
to
> > step
> > > 1:
> > > > >>>          -
> > > > >>>          - PARSE ERROR: From line 1, column 8 to line 1,
column
> 44:
> > > No
> > > > >>>          match found for function signature MyFunctionName(<ANY>)
> > > > >>>       - If you get an error in this format it means that
the
> > function
> > > > >>>       is there but Drill could not find a signature that
matched
> > the
> > > > param types
> > > > >>>       or param numbers you were passing it. The exact wording
> will
> > > > change but
> > > > >>>       the Missing function implementation is the key phrase
to
> look
> > > > for:
> > > > >>>          -
> > > > >>>          - Error: SYSTEM ERROR:
> > > > >>>          org.apache.drill.exec.exception.SchemaChangeException:
> > > > Failure while trying
> > > > >>>          to materialize incoming schema.  Errors:
> > > > >>>          - Error in expression at index -1.  Error: Missing
> > function
> > > > >>>          implementation: [castBIGINT(VARCHAR-REPEATED)].
 Full
> > > > expression: --UNKNOWN
> > > > >>>          EXPRESSION--
> > > > >>>       3. In your function definition for aggregate functions
you
> > need
> > > > >>>    to set null processing to internal and your isRandom to
false.
> > > > Example
> > > > >>>    below:
> > > > >>>       -
> > > > >>>       - @FunctionTemplate(name = "MyFunctionName", scope
=
> > > > >>>       FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls
=
> > > > >>>       FunctionTemplate.NullHandling.INTERNAL, isRandom =
false,
> > > > >>>       isBinaryCommutative = false, costCategory =
> > > > >>>       FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > >>>
> > > > >>> Below is an example from the Apache Drill tutorial data sets
> > > contained
> > > > >>> in the MapR Apache Drill sandbox. I am pulling an array if
string
> > > > values
> > > > >>> from json data. The string values are actually integers.
I am
> > > > converting to
> > > > >>> string and summing each array entry to the final tally. This
in
> no
> > > way
> > > > >>> represents what this data was for but it did become a handy
way
> for
> > > me
> > > > to
> > > > >>> peck out the "correct" way to build an aggregation UDF function
> > > > >>>
> > > > >>> @FunctionTemplate(name = "MyArraySum", scope =
> > > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
> > > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
> > > > >>> isBinaryCommutative = false, costCategory =
> > > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > >>> public static class MyArraySum implements DrillAggFunc {
> > > > >>>
> > > > >>> @Param RepeatedVarCharHolder listToSearch;
> > > > >>> @Workspace NullableBigIntHolder count;
> > > > >>> @Workspace NullableBigIntHolder sum;
> > > > >>> @Workspace NullableVarCharHolder vc;
> > > > >>> @Output BigIntHolder out;
> > > > >>>
> > > > >>> @Override
> > > > >>> public void setup() {
> > > > >>> count.value=0;
> > > > >>> sum.value = 0;
> > > > >>> }
> > > > >>>
> > > > >>> @Override
> > > > >>> public void add() {
> > > > >>> int c = listToSearch.end - listToSearch.start;
> > > > >>> int val = 0;
> > > > >>> try {
> > > > >>> for(int i=0; i<c; i++){
> > > > >>> listToSearch.vector.getAccessor().get(i, vc);
> > > > >>> String inputStr =
> > > > >>>
> > > >
> > >
> >
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
> > > > >>> vc.end, vc.buffer);
> > > > >>> val = Integer.parseInt(inputStr);
> > > > >>> sum.value = sum.value + val;
> > > > >>> }
> > > > >>> } catch (Exception e) {
> > > > >>> val = 0;
> > > > >>> }
> > > > >>> count.value = count.value + 1;
> > > > >>> }
> > > > >>>
> > > > >>> Example select statement:
> > > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id
as
> > > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json`
t
> limit
> > > 5);
> > > > >>>
> > > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <
> ted.dunning@gmail.com
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Jim,
> > > > >>>>
> > > > >>>> I think that you may be having trouble with aggregators
in
> > general.
> > > > >>>>
> > > > >>>> Have you been able to build *any* aggregator of anything?
 I
> > > haven't.
> > > > >>>>
> > > > >>>> When I try to build an aggregator of int's or doubles,
I get a
> > very
> > > > >>>> persistent problem with Drill even seeing my aggregates:
> > > > >>>>
> > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id)
from
> > > > >>>> cp.`employee.json`;*
> > > > >>>>
> > > > >>>> Jul 04, 2015 4:19:35 PM
> > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException
<init>
> > > > >>>>
> > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
> No
> > > > match
> > > > >>>> found for function signature sum_int(<ANY>)
> > > > >>>>
> > > > >>>> Jul 04, 2015 4:19:35 PM
> > org.apache.calcite.runtime.CalciteException
> > > > >>>> <init>
> > > > >>>>
> > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
From
> > > line
> > > > 1,
> > > > >>>> column 8 to line 1, column 27: No match found for function
> > signature
> > > > >>>> sum_int(<ANY>)
> > > > >>>>
> > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1,
column 27:
> > No
> > > > >>>> match
> > > > >>>> found for function signature sum_int(<ANY>)*
> > > > >>>>
> > > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on
> > 10.0.1.2:31010
> > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > >>>>
> > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id
as
> int))
> > > from
> > > > >>>> cp.`employee.json`*;
> > > > >>>>
> > > > >>>> Jul 04, 2015 4:19:45 PM
> > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException
<init>
> > > > >>>>
> > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
> No
> > > > match
> > > > >>>> found for function signature sum_int(<NUMERIC>)
> > > > >>>>
> > > > >>>> Jul 04, 2015 4:19:45 PM
> > org.apache.calcite.runtime.CalciteException
> > > > >>>> <init>
> > > > >>>>
> > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
From
> > > line
> > > > 1,
> > > > >>>> column 8 to line 1, column 40: No match found for function
> > signature
> > > > >>>> sum_int(<NUMERIC>)
> > > > >>>>
> > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1,
column 40:
> > No
> > > > >>>> match
> > > > >>>> found for function signature sum_int(<NUMERIC>)*
> > > > >>>>
> > > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on
> > 10.0.1.2:31010
> > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > >>>>
> > > > >>>> 0: jdbc:drill:zk=local>
> > > > >>>>
> > > > >>>>
> > > > >>>> It looks like there is some undocumented subtlety about
how to
> > > > register
> > > > >>>> an
> > > > >>>> aggregator.
> > > > >>>>
> > > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <jbates@maprtech.com>
> > > > wrote:
> > > > >>>>
> > > > >>>> > I'm working on the same thing. I want to aggregate
a list of
> > > values.
> > > > >>>> It has
> > > > >>>> > been a search and guess game for the most part.
I'm still
> stuck
> > in
> > > > the
> > > > >>>> > process of getting the values all into a list. The
writers
> look
> > > > >>>> interesting
> > > > >>>> > but for aggregation functions  it looks like the
input is the
> > > param
> > > > >>>> and
> > > > >>>> > output objects can't hold the aggregations steps.
The
> Workspace
> > is
> > > > >>>> where
> > > > >>>> > that happens. If I try and use a Writer in a workspace
it
> won't
> > > load
> > > > >>>> and
> > > > >>>> > tells me to change it to Holders which was why I
was using
> them
> > to
> > > > >>>> start
> > > > >>>> > with. Maybe I'm missing the architecture of the
agg function.
> It
> > > > >>>> looked
> > > > >>>> > like it was....
> > > > >>>> >
> > > > >>>> > @Param comes in -> initialize @Workspace vars
in setup ->
> > process
> > > > data
> > > > >>>> > through @Workspace vars in add -> finalize @Output
in output.
> > > > >>>> >
> > > > >>>> > So I'm back to trying to figure out how to create
a
> > > > >>>> RepeatedBigIntHolder or
> > > > >>>> > a RepeatedVarCharHolder...
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <
> > > ted.dunning@gmail.com>
> > > > >>>> wrote:
> > > > >>>> >
> > > > >>>> > > I am working on trying to build any kind of
list
> constructing
> > > > >>>> aggregator
> > > > >>>> > > and having absolute fits.
> > > > >>>> > >
> > > > >>>> > > To simplify life, I decided to just build a
generic list
> > builder
> > > > >>>> that is
> > > > >>>> > a
> > > > >>>> > > scalar function that returns a list containing
its argument.
> > > Thus
> > > > >>>> > zoop(3)
> > > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3])
=> [[1,2,3]].
> > > > >>>> > >
> > > > >>>> > > The ComplexWriter looks like the place to go.
As usual, the
> > > > >>>> complete lack
> > > > >>>> > > of comments in most of Drill makes this very
hard since I
> have
> > > to
> > > > >>>> guess
> > > > >>>> > > what works and what doesn't.
> > > > >>>> > >
> > > > >>>> > > In my code, I note that ComplexWriter has a
nice
> rootAsList()
> > > > >>>> method.  I
> > > > >>>> > > used this in zip and it works nicely to construct
lists for
> > > > >>>> output.  I
> > > > >>>> > note
> > > > >>>> > > that the resulting ListWriter has a method
> > > copyReader(FieldReader
> > > > >>>> var1)
> > > > >>>> > > which looks really good.
> > > > >>>> > >
> > > > >>>> > > Unfortunately, the only implementation of copyReader()
is in
> > > > >>>> > > AbstractFieldWriter and it looks this:
> > > > >>>> > >
> > > > >>>> > > public void copyReader(FieldReader reader)
{
> > > > >>>> > >     this.fail("Copy FieldReader");
> > > > >>>> > > }
> > > > >>>> > >
> > > > >>>> > > I would like to formally say at this point
"WTF"?
> > > > >>>> > >
> > > > >>>> > > In digging in further, I see other methods
that look handy
> > like
> > > > >>>> > >
> > > > >>>> > > public void write(IntHolder holder) {
> > > > >>>> > >     this.fail("Int");
> > > > >>>> > > }
> > > > >>>> > >
> > > > >>>> > > And then in looking at implementations, it
looks like there
> > is a
> > > > >>>> > > combinatorial explosion because every type
seems to need a
> > write
> > > > >>>> method
> > > > >>>> > for
> > > > >>>> > > every other type.
> > > > >>>> > >
> > > > >>>> > > What is the thought here?  How can I copy an
arbitrary value
> > > into
> > > > a
> > > > >>>> list?
> > > > >>>> > >
> > > > >>>> > > My next thought was to build code that dispatches
on type.
> > > There
> > > > >>>> is a
> > > > >>>> > > method called getType() on the FieldReader.
 Unfortunately,
> > that
> > > > >>>> drives
> > > > >>>> > > into code generated by protoc and I see no
way to dispatch
> on
> > > the
> > > > >>>> type of
> > > > >>>> > > an incoming value.
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > How is this supposed to work?
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid
<
> > > > baid.mehant@gmail.com>
> > > > >>>> > wrote:
> > > > >>>> > >
> > > > >>>> > > > For a detailed example on using ComplexWriter
interface
> you
> > > can
> > > > >>>> take a
> > > > >>>> > > look
> > > > >>>> > > > at the Mappify
> > > > >>>> > > > <
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > > > >>>> > > > >
> > > > >>>> > > > (kvgen) function. The function itself
is very simple
> however
> > > it
> > > > >>>> makes
> > > > >>>> > use
> > > > >>>> > > > of the utility methods in MappifyUtility
> > > > >>>> > > > <
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > > > >>>> > > > >
> > > > >>>> > > > and MapUtility
> > > > >>>> > > > <
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > > > >>>> > > > >
> > > > >>>> > > > which perform most of the work.
> > > > >>>> > > >
> > > > >>>> > > > Currently we don't have a generic infrastructure
to handle
> > > > errors
> > > > >>>> > coming
> > > > >>>> > > > out of functions. However there is UserException,
which
> when
> > > > >>>> raised
> > > > >>>> > will
> > > > >>>> > > > make sure that Drill does not gobble up
the error message
> in
> > > > that
> > > > >>>> > > > exception. So you can probably throw a
UserException with
> > the
> > > > >>>> failing
> > > > >>>> > > input
> > > > >>>> > > > in your function to make sure it propagates
to the user.
> > > > >>>> > > >
> > > > >>>> > > > Thanks
> > > > >>>> > > > Mehant
> > > > >>>> > > >
> > > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques
Nadeau <
> > > > >>>> jacques@apache.org>
> > > > >>>> > > wrote:
> > > > >>>> > > >
> > > > >>>> > > > > *Holders are for both input and output.
 You can also
> use
> > > > >>>> > CompleWriter
> > > > >>>> > > > for
> > > > >>>> > > > > output and FieldReader for input
if you want to write or
> > > read
> > > > a
> > > > >>>> > complex
> > > > >>>> > > > > value.
> > > > >>>> > > > >
> > > > >>>> > > > > I don't think we've provided a really
clean way to
> > > construct a
> > > > >>>> > > > > Repeated*Holder for output purposes.
 You can probably
> do
> > it
> > > > by
> > > > >>>> > > reaching
> > > > >>>> > > > > into a bunch of internal interfaces
in Drill.  However,
> I
> > > > would
> > > > >>>> > > recommend
> > > > >>>> > > > > using the ComplexWriter output pattern
for now.  This
> will
> > > be
> > > > a
> > > > >>>> > little
> > > > >>>> > > > less
> > > > >>>> > > > > efficient but substantially less
brittle.  I suggest you
> > > open
> > > > >>>> up a
> > > > >>>> > jira
> > > > >>>> > > > for
> > > > >>>> > > > > using a Repeated*Holder as an output.
> > > > >>>> > > > >
> > > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted
Dunning <
> > > > >>>> ted.dunning@gmail.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > > > Holders are for input, I think.
> > > > >>>> > > > > >
> > > > >>>> > > > > > Try the different kinds of writers.
> > > > >>>> > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49
PM, Jim Bates <
> > > > >>>> jbates@maprtech.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > > > >
> > > > >>>> > > > > > > Using a repeatedholder
as a @param I've got
> working. I
> > > was
> > > > >>>> > working
> > > > >>>> > > > on a
> > > > >>>> > > > > > > custom aggregator function
using DrillAggFunc. In
> > this I
> > > > >>>> can do
> > > > >>>> > > > simple
> > > > >>>> > > > > > > things but If I want to
build a list values and do
> > > > >>>> something with
> > > > >>>> > > it
> > > > >>>> > > > in
> > > > >>>> > > > > > the
> > > > >>>> > > > > > > final output method I think
I need to use
> > > RepeatedHolders
> > > > >>>> in the
> > > > >>>> > > > > > > @Workspace. To do that
I need to create a new one in
> > the
> > > > >>>> setup
> > > > >>>> > > > method.
> > > > >>>> > > > > I
> > > > >>>> > > > > > > can't get one built. They
all require a
> > BufferAllocator
> > > to
> > > > >>>> be
> > > > >>>> > > passed
> > > > >>>> > > > in
> > > > >>>> > > > > > to
> > > > >>>> > > > > > > build it. I have not found
a way to get an allocator
> > > yet.
> > > > >>>> Any
> > > > >>>> > > > > > suggestions?
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > On Sat, Jul 4, 2015 at
1:37 PM, Ted Dunning <
> > > > >>>> > ted.dunning@gmail.com
> > > > >>>> > > >
> > > > >>>> > > > > > wrote:
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > > If you look at the
zip function in
> > > > >>>> > > > > > > >
> > https://github.com/mapr-demos/simple-drill-functions
> > > > you
> > > > >>>> can
> > > > >>>> > > have
> > > > >>>> > > > an
> > > > >>>> > > > > > > > example of building
a structure.
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > The basic idea is
that your output is denoted as
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > >         @Output
> > > > >>>> > > > > > > >         BaseWriter.ComplexWriter
writer;
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > The pattern for building
a list of lists of
> integers
> > > is
> > > > >>>> like
> > > > >>>> > > this:
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > >         writer.setValueCount(n);
> > > > >>>> > > > > > > >         ...
> > > > >>>> > > > > > > >         BaseWriter.ListWriter
outer =
> > > > writer.rootAsList();
> > > > >>>> > > > > > > >         outer.start();
// [ outer list
> > > > >>>> > > > > > > >         ...
> > > > >>>> > > > > > > >         // for each
inner list
> > > > >>>> > > > > > > >             BaseWriter.ListWriter
inner =
> > > outer.list();
> > > > >>>> > > > > > > >             inner.start();
> > > > >>>> > > > > > > >             // for
each inner list element
> > > > >>>> > > > > > > >
> > > >  inner.integer().writeInt(accessor.get(i));
> > > > >>>> > > > > > > >             }
> > > > >>>> > > > > > > >             inner.end();
  // ] inner list
> > > > >>>> > > > > > > >         }
> > > > >>>> > > > > > > >         outer.end();
// ] outer list
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > On Sat, Jul 4, 2015
at 10:29 AM, Jim Bates <
> > > > >>>> > jbates@maprtech.com>
> > > > >>>> > > > > > wrote:
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > > I have working
aggregation and simple UDFs. I've
> > > been
> > > > >>>> trying
> > > > >>>> > to
> > > > >>>> > > > > > > document
> > > > >>>> > > > > > > > > and understand
each of the options available in
> a
> > > > Drill
> > > > >>>> UDF.
> > > > >>>> > > > > > > > Understanding
> > > > >>>> > > > > > > > > the different
FunctionScope's, the ones that are
> > > > >>>> allowed, the
> > > > >>>> > > > ones
> > > > >>>> > > > > > that
> > > > >>>> > > > > > > > are
> > > > >>>> > > > > > > > > not. The impact
of different cost categories.
> The
> > > > >>>> different
> > > > >>>> > > > steps
> > > > >>>> > > > > > > needed
> > > > >>>> > > > > > > > > to understand
handling any of the supported data
> > > types
> > > > >>>> and
> > > > >>>> > > > > > structures
> > > > >>>> > > > > > > in
> > > > >>>> > > > > > > > > drill.
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > Here are a few
of my current road blocks. Any
> > > pointers
> > > > >>>> would
> > > > >>>> > be
> > > > >>>> > > > > > greatly
> > > > >>>> > > > > > > > > appreciated.
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >    1. I've been
trying to understand how to
> > > correctly
> > > > >>>> use
> > > > >>>> > > > > > > RepeatedHolders
> > > > >>>> > > > > > > > >    of whatever
type. For this discussion lets
> > start
> > > > >>>> with a
> > > > >>>> > > > > > > > >    RepeatedBigIntHolder.
I'm trying to figure
> out
> > > the
> > > > >>>> best
> > > > >>>> > way
> > > > >>>> > > to
> > > > >>>> > > > > > > create
> > > > >>>> > > > > > > > a
> > > > >>>> > > > > > > > > new
> > > > >>>> > > > > > > > >    one. I have
not figured out where in the
> > existing
> > > > >>>> drill
> > > > >>>> > code
> > > > >>>> > > > > > someone
> > > > >>>> > > > > > > > > does
> > > > >>>> > > > > > > > >    this. If I
use a  RepeatedBigIntHolder as a
> > > > Workspace
> > > > >>>> > object
> > > > >>>> > > > is
> > > > >>>> > > > > is
> > > > >>>> > > > > > > > null
> > > > >>>> > > > > > > > > to
> > > > >>>> > > > > > > > >    start with.
I created a new one in the
> startup
> > > > >>>> section of
> > > > >>>> > > the
> > > > >>>> > > > > udf
> > > > >>>> > > > > > > but
> > > > >>>> > > > > > > > > the
> > > > >>>> > > > > > > > >    vector was
null. I can find no reference in
> > > > creating
> > > > >>>> a new
> > > > >>>> > > > > > > > BigIntVector.
> > > > >>>> > > > > > > > >    There is a
way to create a BigIntVector and I
> > did
> > > > >>>> find an
> > > > >>>> > > > > example
> > > > >>>> > > > > > of
> > > > >>>> > > > > > > > >    creating a
new VarCharVector but I can't do
> > that
> > > > >>>> using the
> > > > >>>> > > > drill
> > > > >>>> > > > > > jar
> > > > >>>> > > > > > > > > files
> > > > >>>> > > > > > > > >    from 1.0.
The
> > > > >>>> org.apache.drill.common.types.TypeProtos and
> > > > >>>> > > > > > > > >    the
> > > > >>>> org.apache.drill.common.types.TypeProtos.MinorType
> > > > >>>> > > classes
> > > > >>>> > > > > do
> > > > >>>> > > > > > > not
> > > > >>>> > > > > > > > >    appear to
be accessible from the drill jar
> > files.
> > > > >>>> > > > > > > > >    2. What is
the best way to close out a UDF in
> > the
> > > > >>>> event it
> > > > >>>> > > > > > generates
> > > > >>>> > > > > > > > an
> > > > >>>> > > > > > > > >    exception?
Are there specific steps one
> should
> > > > >>>> follow to
> > > > >>>> > > make
> > > > >>>> > > > a
> > > > >>>> > > > > > > clean
> > > > >>>> > > > > > > > > exit
> > > > >>>> > > > > > > > >    in a catch
block that are beneficial to
> Drill?
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message