drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Some questions on UDFs
Date Sun, 05 Jul 2015 22:15:25 GMT
That's good news Jim.

These are interfaces that very few people have built against to date so any
suggestions for improvements, clarifications, etc would be greatly
appreciated.

Thanks
Jacques
On Jul 5, 2015 3:07 PM, "Jim Bates" <jbates@maprtech.com> wrote:

> Just to close out this thread....
>
> I got my final UDFs to work. I ended up with 2. One to create an array of
> values and the other to calculate a simple linear regression. This data set
> was a simple x = y slope
>
> SELECT MyLinearRegression2(xValues,yValues,CAST(22356 as BIGINT)) as
> xPerdict FROM (SELECT MyList(test_field1) as xValues, MyList(test_field2)
> as yValues  FROM (SELECT test_field1,test_field2 FROM
> `hive.default`.`my_hive_table` limit 10));
> +-----------+
> | xPerdict  |
> +-----------+
> | 22356.0   |
> +-----------+
>
>
> On Sun, Jul 5, 2015 at 4:10 PM, Jacques Nadeau <jacques@apache.org> wrote:
>
> > You're right.  You're off the beaten path. I think everyone here would
> love
> > to have more documentation and more comments. Of course, all of these
> take
> > time.
> >
> > If you have time to volunteer to help improve these things, that would be
> > great.
> >
> > With regards to the question about the jira, describe your use case and
> > what functionality you couldn't find or make work. The active developers
> on
> > the project can then do their best to help shape the Jira into better
> docs,
> > javadocs and/or new functionality as time allows.
> >
> > On Jul 5, 2015 1:37 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >
> > > Uh... actually, I think that it isn't obvious because there is
> absolutely
> > > no documentation and there are no comments in the code.
> > >
> > > And what should the JIRA say?  We can't even tell what's missing, if
> > > anything, because we can't tell how it is supposed to work.
> > >
> > >
> > >
> > >
> > > On Sun, Jul 5, 2015 at 11:50 AM, Jacques Nadeau <jacques@apache.org>
> > > wrote:
> > >
> > > > It isn't obvious because you shouldn't do it.  Please file a JIRA to
> > add
> > > > real support for this type of output.
> > > >
> > > > Your current function would leak large amounts of memory that would
> > > > ultimately crash the node.
> > > >
> > > > Realistically, there are very few internal Drill APIs that you should
> > > > access via a UDF (injectables, holders, complexwriter, fieldreader
> and
> > > > helpers).  A post 1.0 goal was to provide a UDF interface JAR to
> ensure
> > > > people don't accidentally reach into Drill's internals.  (A later
> > > > possibility is bytecode weaving to completely protect against it).
> > > >
> > > > J
> > > >
> > > > On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunning@gmail.com>
> > > > wrote:
> > > >
> > > > > That was impressively non-obvious.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jbates@maprtech.com>
> > wrote:
> > > > >
> > > > > > I did get a new RepeatedBigIntHolder built and added a
> BigIntVector
> > > > added
> > > > > > to it. I'll try it in the UDF tomorrow and see if there is a
> > > difference
> > > > > in
> > > > > > the ways I found to get a BufferAllocator.
> > > > > >
> > > > > > .
> > > > > > .
> > > > > > .
> > > > > > @Inject DrillBuf buffer;
> > > > > > @Workspace RepeatedBigIntHolder yList;
> > > > > > .
> > > > > > .
> > > > > > .
> > > > > > @Override
> > > > > > public void setup() {
> > > > > > .
> > > > > > .
> > > > > > .
> > > > > > //org.apache.drill.exec.memory.BufferAllocator allocator =
> > > > > > buffer.getAllocator();
> > > > > > org.apache.drill.exec.memory.BufferAllocator allocator =  new
> > > > > > org.apache.drill.exec.memory.TopLevelAllocator();
> > > > > > yList = new RepeatedBigIntHolder();
> > > > > > yList.vector = new
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
> > > > > > allocator);
> > > > > > .
> > > > > > .
> > > > > > .
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jbates@maprtech.com>
> > > wrote:
> > > > > >
> > > > > > > I still have issues finding the correct way to create and use a
> > > > > > > RepeatedHolder and Writers are a non starter for Workspace
> > values.
> > > I
> > > > > can
> > > > > > > make do with creating a concatenated string in a VarCharHolder
> > for
> > > > > small
> > > > > > > data sets to get past this in the short term and finish testing
> > the
> > > > > > output
> > > > > > > values I expect but won't be able to do any scale till I figure
> > out
> > > > how
> > > > > > to
> > > > > > > make a repeated list.
> > > > > > >
> > > > > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jbates@maprtech.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Well... Converting from string to integers anyway... To many
> 4th
> > > of
> > > > > July
> > > > > > >> Hot Dogs. going into nitrate overload. :)
> > > > > > >>
> > > > > > >> I am pulling an array of string values from json data. The
> > string
> > > > > values
> > > > > > >> are actually integers. I am converting to integers and summing
> > > each
> > > > > > >> array entry to the final tally.
> > > > > > >>
> > > > > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <
> jbates@maprtech.com>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> Ted,
> > > > > > >>>
> > > > > > >>> Yes, I started out just getting a basic count to work. I am
> > > trying
> > > > to
> > > > > > >>> keep the workflow as close to a basic user as possible. As
> > such,
> > > I
> > > > am
> > > > > > >>> building and using the MapR Apache Drill sandbox to test.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>    1. Always look at the drillbits.log file to see if drill
> had
> > > any
> > > > > > >>>    issues loading your UDF. That was where I learned that all
> > > > > > workspace values
> > > > > > >>>    needed to be holders
> > > > > > >>>       -
> > > > > > >>>       - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure
> > > > loading
> > > > > > >>>       function class
> > > > > > >>>
> > > > > >
> > com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
> > > > > field
> > > > > > >>>       xList. Aggregate function 'MyLinearRegression1'
> workspace
> > > > > > variable 'xList'
> > > > > > >>>       is of type 'interface
> > > > > > >>>
> > > > > >
> > > org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
> > > > > > >>>       Please change it to Holder type.
> > > > > > >>>    2. Error messages:
> > > > > > >>>       - If you get an error in this format it means that
> Drill
> > > can
> > > > > not
> > > > > > >>>       find your function so it probably didn't load it. back
> to
> > > > step
> > > > > 1:
> > > > > > >>>          -
> > > > > > >>>          - PARSE ERROR: From line 1, column 8 to line 1,
> column
> > > 44:
> > > > > No
> > > > > > >>>          match found for function signature
> > MyFunctionName(<ANY>)
> > > > > > >>>       - If you get an error in this format it means that the
> > > > function
> > > > > > >>>       is there but Drill could not find a signature that
> > matched
> > > > the
> > > > > > param types
> > > > > > >>>       or param numbers you were passing it. The exact wording
> > > will
> > > > > > change but
> > > > > > >>>       the Missing function implementation is the key phrase
> to
> > > look
> > > > > > for:
> > > > > > >>>          -
> > > > > > >>>          - Error: SYSTEM ERROR:
> > > > > > >>>
> org.apache.drill.exec.exception.SchemaChangeException:
> > > > > > Failure while trying
> > > > > > >>>          to materialize incoming schema.  Errors:
> > > > > > >>>          - Error in expression at index -1.  Error: Missing
> > > > function
> > > > > > >>>          implementation: [castBIGINT(VARCHAR-REPEATED)].
> Full
> > > > > > expression: --UNKNOWN
> > > > > > >>>          EXPRESSION--
> > > > > > >>>       3. In your function definition for aggregate functions
> > you
> > > > need
> > > > > > >>>    to set null processing to internal and your isRandom to
> > false.
> > > > > > Example
> > > > > > >>>    below:
> > > > > > >>>       -
> > > > > > >>>       - @FunctionTemplate(name = "MyFunctionName", scope =
> > > > > > >>>       FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
> > > > > > >>>       FunctionTemplate.NullHandling.INTERNAL, isRandom =
> false,
> > > > > > >>>       isBinaryCommutative = false, costCategory =
> > > > > > >>>       FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > > > >>>
> > > > > > >>> Below is an example from the Apache Drill tutorial data sets
> > > > > contained
> > > > > > >>> in the MapR Apache Drill sandbox. I am pulling an array if
> > string
> > > > > > values
> > > > > > >>> from json data. The string values are actually integers. I am
> > > > > > converting to
> > > > > > >>> string and summing each array entry to the final tally. This
> in
> > > no
> > > > > way
> > > > > > >>> represents what this data was for but it did become a handy
> way
> > > for
> > > > > me
> > > > > > to
> > > > > > >>> peck out the "correct" way to build an aggregation UDF
> function
> > > > > > >>>
> > > > > > >>> @FunctionTemplate(name = "MyArraySum", scope =
> > > > > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
> > > > > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
> > > > > > >>> isBinaryCommutative = false, costCategory =
> > > > > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > > > >>> public static class MyArraySum implements DrillAggFunc {
> > > > > > >>>
> > > > > > >>> @Param RepeatedVarCharHolder listToSearch;
> > > > > > >>> @Workspace NullableBigIntHolder count;
> > > > > > >>> @Workspace NullableBigIntHolder sum;
> > > > > > >>> @Workspace NullableVarCharHolder vc;
> > > > > > >>> @Output BigIntHolder out;
> > > > > > >>>
> > > > > > >>> @Override
> > > > > > >>> public void setup() {
> > > > > > >>> count.value=0;
> > > > > > >>> sum.value = 0;
> > > > > > >>> }
> > > > > > >>>
> > > > > > >>> @Override
> > > > > > >>> public void add() {
> > > > > > >>> int c = listToSearch.end - listToSearch.start;
> > > > > > >>> int val = 0;
> > > > > > >>> try {
> > > > > > >>> for(int i=0; i<c; i++){
> > > > > > >>> listToSearch.vector.getAccessor().get(i, vc);
> > > > > > >>> String inputStr =
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
> > > > > > >>> vc.end, vc.buffer);
> > > > > > >>> val = Integer.parseInt(inputStr);
> > > > > > >>> sum.value = sum.value + val;
> > > > > > >>> }
> > > > > > >>> } catch (Exception e) {
> > > > > > >>> val = 0;
> > > > > > >>> }
> > > > > > >>> count.value = count.value + 1;
> > > > > > >>> }
> > > > > > >>>
> > > > > > >>> Example select statement:
> > > > > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT
> t.trans_info.prod_id
> > as
> > > > > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t
> > > limit
> > > > > 5);
> > > > > > >>>
> > > > > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <
> > > ted.dunning@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Jim,
> > > > > > >>>>
> > > > > > >>>> I think that you may be having trouble with aggregators in
> > > > general.
> > > > > > >>>>
> > > > > > >>>> Have you been able to build *any* aggregator of anything?  I
> > > > > haven't.
> > > > > > >>>>
> > > > > > >>>> When I try to build an aggregator of int's or doubles, I
> get a
> > > > very
> > > > > > >>>> persistent problem with Drill even seeing my aggregates:
> > > > > > >>>>
> > > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from
> > > > > > >>>> cp.`employee.json`;*
> > > > > > >>>>
> > > > > > >>>> Jul 04, 2015 4:19:35 PM
> > > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
> > > > > > >>>>
> > > > > > >>>> SEVERE:
> org.apache.calcite.sql.validate.SqlValidatorException:
> > > No
> > > > > > match
> > > > > > >>>> found for function signature sum_int(<ANY>)
> > > > > > >>>>
> > > > > > >>>> Jul 04, 2015 4:19:35 PM
> > > > org.apache.calcite.runtime.CalciteException
> > > > > > >>>> <init>
> > > > > > >>>>
> > > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
> > From
> > > > > line
> > > > > > 1,
> > > > > > >>>> column 8 to line 1, column 27: No match found for function
> > > > signature
> > > > > > >>>> sum_int(<ANY>)
> > > > > > >>>>
> > > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column
> > 27:
> > > > No
> > > > > > >>>> match
> > > > > > >>>> found for function signature sum_int(<ANY>)*
> > > > > > >>>>
> > > > > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on
> > > > 10.0.1.2:31010
> > > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > > > >>>>
> > > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as
> > > int))
> > > > > from
> > > > > > >>>> cp.`employee.json`*;
> > > > > > >>>>
> > > > > > >>>> Jul 04, 2015 4:19:45 PM
> > > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
> > > > > > >>>>
> > > > > > >>>> SEVERE:
> org.apache.calcite.sql.validate.SqlValidatorException:
> > > No
> > > > > > match
> > > > > > >>>> found for function signature sum_int(<NUMERIC>)
> > > > > > >>>>
> > > > > > >>>> Jul 04, 2015 4:19:45 PM
> > > > org.apache.calcite.runtime.CalciteException
> > > > > > >>>> <init>
> > > > > > >>>>
> > > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
> > From
> > > > > line
> > > > > > 1,
> > > > > > >>>> column 8 to line 1, column 40: No match found for function
> > > > signature
> > > > > > >>>> sum_int(<NUMERIC>)
> > > > > > >>>>
> > > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column
> > 40:
> > > > No
> > > > > > >>>> match
> > > > > > >>>> found for function signature sum_int(<NUMERIC>)*
> > > > > > >>>>
> > > > > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on
> > > > 10.0.1.2:31010
> > > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > > > >>>>
> > > > > > >>>> 0: jdbc:drill:zk=local>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> It looks like there is some undocumented subtlety about how
> to
> > > > > > register
> > > > > > >>>> an
> > > > > > >>>> aggregator.
> > > > > > >>>>
> > > > > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <
> > jbates@maprtech.com>
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>> > I'm working on the same thing. I want to aggregate a list
> of
> > > > > values.
> > > > > > >>>> It has
> > > > > > >>>> > been a search and guess game for the most part. I'm still
> > > stuck
> > > > in
> > > > > > the
> > > > > > >>>> > process of getting the values all into a list. The writers
> > > look
> > > > > > >>>> interesting
> > > > > > >>>> > but for aggregation functions  it looks like the input is
> > the
> > > > > param
> > > > > > >>>> and
> > > > > > >>>> > output objects can't hold the aggregations steps. The
> > > Workspace
> > > > is
> > > > > > >>>> where
> > > > > > >>>> > that happens. If I try and use a Writer in a workspace it
> > > won't
> > > > > load
> > > > > > >>>> and
> > > > > > >>>> > tells me to change it to Holders which was why I was using
> > > them
> > > > to
> > > > > > >>>> start
> > > > > > >>>> > with. Maybe I'm missing the architecture of the agg
> > function.
> > > It
> > > > > > >>>> looked
> > > > > > >>>> > like it was....
> > > > > > >>>> >
> > > > > > >>>> > @Param comes in -> initialize @Workspace vars in setup ->
> > > > process
> > > > > > data
> > > > > > >>>> > through @Workspace vars in add -> finalize @Output in
> > output.
> > > > > > >>>> >
> > > > > > >>>> > So I'm back to trying to figure out how to create a
> > > > > > >>>> RepeatedBigIntHolder or
> > > > > > >>>> > a RepeatedVarCharHolder...
> > > > > > >>>> >
> > > > > > >>>> >
> > > > > > >>>> >
> > > > > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <
> > > > > ted.dunning@gmail.com>
> > > > > > >>>> wrote:
> > > > > > >>>> >
> > > > > > >>>> > > I am working on trying to build any kind of list
> > > constructing
> > > > > > >>>> aggregator
> > > > > > >>>> > > and having absolute fits.
> > > > > > >>>> > >
> > > > > > >>>> > > To simplify life, I decided to just build a generic list
> > > > builder
> > > > > > >>>> that is
> > > > > > >>>> > a
> > > > > > >>>> > > scalar function that returns a list containing its
> > argument.
> > > > > Thus
> > > > > > >>>> > zoop(3)
> > > > > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) =>
> > [[1,2,3]].
> > > > > > >>>> > >
> > > > > > >>>> > > The ComplexWriter looks like the place to go. As usual,
> > the
> > > > > > >>>> complete lack
> > > > > > >>>> > > of comments in most of Drill makes this very hard since
> I
> > > have
> > > > > to
> > > > > > >>>> guess
> > > > > > >>>> > > what works and what doesn't.
> > > > > > >>>> > >
> > > > > > >>>> > > In my code, I note that ComplexWriter has a nice
> > > rootAsList()
> > > > > > >>>> method.  I
> > > > > > >>>> > > used this in zip and it works nicely to construct lists
> > for
> > > > > > >>>> output.  I
> > > > > > >>>> > note
> > > > > > >>>> > > that the resulting ListWriter has a method
> > > > > copyReader(FieldReader
> > > > > > >>>> var1)
> > > > > > >>>> > > which looks really good.
> > > > > > >>>> > >
> > > > > > >>>> > > Unfortunately, the only implementation of copyReader()
> is
> > in
> > > > > > >>>> > > AbstractFieldWriter and it looks this:
> > > > > > >>>> > >
> > > > > > >>>> > > public void copyReader(FieldReader reader) {
> > > > > > >>>> > >     this.fail("Copy FieldReader");
> > > > > > >>>> > > }
> > > > > > >>>> > >
> > > > > > >>>> > > I would like to formally say at this point "WTF"?
> > > > > > >>>> > >
> > > > > > >>>> > > In digging in further, I see other methods that look
> handy
> > > > like
> > > > > > >>>> > >
> > > > > > >>>> > > public void write(IntHolder holder) {
> > > > > > >>>> > >     this.fail("Int");
> > > > > > >>>> > > }
> > > > > > >>>> > >
> > > > > > >>>> > > And then in looking at implementations, it looks like
> > there
> > > > is a
> > > > > > >>>> > > combinatorial explosion because every type seems to
> need a
> > > > write
> > > > > > >>>> method
> > > > > > >>>> > for
> > > > > > >>>> > > every other type.
> > > > > > >>>> > >
> > > > > > >>>> > > What is the thought here?  How can I copy an arbitrary
> > value
> > > > > into
> > > > > > a
> > > > > > >>>> list?
> > > > > > >>>> > >
> > > > > > >>>> > > My next thought was to build code that dispatches on
> type.
> > > > > There
> > > > > > >>>> is a
> > > > > > >>>> > > method called getType() on the FieldReader.
> > Unfortunately,
> > > > that
> > > > > > >>>> drives
> > > > > > >>>> > > into code generated by protoc and I see no way to
> dispatch
> > > on
> > > > > the
> > > > > > >>>> type of
> > > > > > >>>> > > an incoming value.
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > > How is this supposed to work?
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <
> > > > > > baid.mehant@gmail.com>
> > > > > > >>>> > wrote:
> > > > > > >>>> > >
> > > > > > >>>> > > > For a detailed example on using ComplexWriter
> interface
> > > you
> > > > > can
> > > > > > >>>> take a
> > > > > > >>>> > > look
> > > > > > >>>> > > > at the Mappify
> > > > > > >>>> > > > <
> > > > > > >>>> > > >
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > > > > > >>>> > > > >
> > > > > > >>>> > > > (kvgen) function. The function itself is very simple
> > > however
> > > > > it
> > > > > > >>>> makes
> > > > > > >>>> > use
> > > > > > >>>> > > > of the utility methods in MappifyUtility
> > > > > > >>>> > > > <
> > > > > > >>>> > > >
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > > > > > >>>> > > > >
> > > > > > >>>> > > > and MapUtility
> > > > > > >>>> > > > <
> > > > > > >>>> > > >
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > > > > > >>>> > > > >
> > > > > > >>>> > > > which perform most of the work.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Currently we don't have a generic infrastructure to
> > handle
> > > > > > errors
> > > > > > >>>> > coming
> > > > > > >>>> > > > out of functions. However there is UserException,
> which
> > > when
> > > > > > >>>> raised
> > > > > > >>>> > will
> > > > > > >>>> > > > make sure that Drill does not gobble up the error
> > message
> > > in
> > > > > > that
> > > > > > >>>> > > > exception. So you can probably throw a UserException
> > with
> > > > the
> > > > > > >>>> failing
> > > > > > >>>> > > input
> > > > > > >>>> > > > in your function to make sure it propagates to the
> user.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Thanks
> > > > > > >>>> > > > Mehant
> > > > > > >>>> > > >
> > > > > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <
> > > > > > >>>> jacques@apache.org>
> > > > > > >>>> > > wrote:
> > > > > > >>>> > > >
> > > > > > >>>> > > > > *Holders are for both input and output.  You can
> also
> > > use
> > > > > > >>>> > CompleWriter
> > > > > > >>>> > > > for
> > > > > > >>>> > > > > output and FieldReader for input if you want to
> write
> > or
> > > > > read
> > > > > > a
> > > > > > >>>> > complex
> > > > > > >>>> > > > > value.
> > > > > > >>>> > > > >
> > > > > > >>>> > > > > I don't think we've provided a really clean way to
> > > > > construct a
> > > > > > >>>> > > > > Repeated*Holder for output purposes.  You can
> probably
> > > do
> > > > it
> > > > > > by
> > > > > > >>>> > > reaching
> > > > > > >>>> > > > > into a bunch of internal interfaces in Drill.
> > However,
> > > I
> > > > > > would
> > > > > > >>>> > > recommend
> > > > > > >>>> > > > > using the ComplexWriter output pattern for now.
> This
> > > will
> > > > > be
> > > > > > a
> > > > > > >>>> > little
> > > > > > >>>> > > > less
> > > > > > >>>> > > > > efficient but substantially less brittle.  I suggest
> > you
> > > > > open
> > > > > > >>>> up a
> > > > > > >>>> > jira
> > > > > > >>>> > > > for
> > > > > > >>>> > > > > using a Repeated*Holder as an output.
> > > > > > >>>> > > > >
> > > > > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <
> > > > > > >>>> ted.dunning@gmail.com>
> > > > > > >>>> > > > wrote:
> > > > > > >>>> > > > >
> > > > > > >>>> > > > > > Holders are for input, I think.
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > > > Try the different kinds of writers.
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <
> > > > > > >>>> jbates@maprtech.com>
> > > > > > >>>> > > > wrote:
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > > > > Using a repeatedholder as a @param I've got
> > > working. I
> > > > > was
> > > > > > >>>> > working
> > > > > > >>>> > > > on a
> > > > > > >>>> > > > > > > custom aggregator function using DrillAggFunc.
> In
> > > > this I
> > > > > > >>>> can do
> > > > > > >>>> > > > simple
> > > > > > >>>> > > > > > > things but If I want to build a list values and
> do
> > > > > > >>>> something with
> > > > > > >>>> > > it
> > > > > > >>>> > > > in
> > > > > > >>>> > > > > > the
> > > > > > >>>> > > > > > > final output method I think I need to use
> > > > > RepeatedHolders
> > > > > > >>>> in the
> > > > > > >>>> > > > > > > @Workspace. To do that I need to create a new
> one
> > in
> > > > the
> > > > > > >>>> setup
> > > > > > >>>> > > > method.
> > > > > > >>>> > > > > I
> > > > > > >>>> > > > > > > can't get one built. They all require a
> > > > BufferAllocator
> > > > > to
> > > > > > >>>> be
> > > > > > >>>> > > passed
> > > > > > >>>> > > > in
> > > > > > >>>> > > > > > to
> > > > > > >>>> > > > > > > build it. I have not found a way to get an
> > allocator
> > > > > yet.
> > > > > > >>>> Any
> > > > > > >>>> > > > > > suggestions?
> > > > > > >>>> > > > > > >
> > > > > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning <
> > > > > > >>>> > ted.dunning@gmail.com
> > > > > > >>>> > > >
> > > > > > >>>> > > > > > wrote:
> > > > > > >>>> > > > > > >
> > > > > > >>>> > > > > > > > If you look at the zip function in
> > > > > > >>>> > > > > > > >
> > > > https://github.com/mapr-demos/simple-drill-functions
> > > > > > you
> > > > > > >>>> can
> > > > > > >>>> > > have
> > > > > > >>>> > > > an
> > > > > > >>>> > > > > > > > example of building a structure.
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > > The basic idea is that your output is denoted
> as
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > >         @Output
> > > > > > >>>> > > > > > > >         BaseWriter.ComplexWriter writer;
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > > The pattern for building a list of lists of
> > > integers
> > > > > is
> > > > > > >>>> like
> > > > > > >>>> > > this:
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > >         writer.setValueCount(n);
> > > > > > >>>> > > > > > > >         ...
> > > > > > >>>> > > > > > > >         BaseWriter.ListWriter outer =
> > > > > > writer.rootAsList();
> > > > > > >>>> > > > > > > >         outer.start(); // [ outer list
> > > > > > >>>> > > > > > > >         ...
> > > > > > >>>> > > > > > > >         // for each inner list
> > > > > > >>>> > > > > > > >             BaseWriter.ListWriter inner =
> > > > > outer.list();
> > > > > > >>>> > > > > > > >             inner.start();
> > > > > > >>>> > > > > > > >             // for each inner list element
> > > > > > >>>> > > > > > > >
> > > > > >  inner.integer().writeInt(accessor.get(i));
> > > > > > >>>> > > > > > > >             }
> > > > > > >>>> > > > > > > >             inner.end();   // ] inner list
> > > > > > >>>> > > > > > > >         }
> > > > > > >>>> > > > > > > >         outer.end(); // ] outer list
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates <
> > > > > > >>>> > jbates@maprtech.com>
> > > > > > >>>> > > > > > wrote:
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > > > > I have working aggregation and simple UDFs.
> > I've
> > > > > been
> > > > > > >>>> trying
> > > > > > >>>> > to
> > > > > > >>>> > > > > > > document
> > > > > > >>>> > > > > > > > > and understand each of the options available
> > in
> > > a
> > > > > > Drill
> > > > > > >>>> UDF.
> > > > > > >>>> > > > > > > > Understanding
> > > > > > >>>> > > > > > > > > the different FunctionScope's, the ones that
> > are
> > > > > > >>>> allowed, the
> > > > > > >>>> > > > ones
> > > > > > >>>> > > > > > that
> > > > > > >>>> > > > > > > > are
> > > > > > >>>> > > > > > > > > not. The impact of different cost
> categories.
> > > The
> > > > > > >>>> different
> > > > > > >>>> > > > steps
> > > > > > >>>> > > > > > > needed
> > > > > > >>>> > > > > > > > > to understand handling any of the supported
> > data
> > > > > types
> > > > > > >>>> and
> > > > > > >>>> > > > > > structures
> > > > > > >>>> > > > > > > in
> > > > > > >>>> > > > > > > > > drill.
> > > > > > >>>> > > > > > > > >
> > > > > > >>>> > > > > > > > > Here are a few of my current road blocks.
> Any
> > > > > pointers
> > > > > > >>>> would
> > > > > > >>>> > be
> > > > > > >>>> > > > > > greatly
> > > > > > >>>> > > > > > > > > appreciated.
> > > > > > >>>> > > > > > > > >
> > > > > > >>>> > > > > > > > >
> > > > > > >>>> > > > > > > > >    1. I've been trying to understand how to
> > > > > correctly
> > > > > > >>>> use
> > > > > > >>>> > > > > > > RepeatedHolders
> > > > > > >>>> > > > > > > > >    of whatever type. For this discussion
> lets
> > > > start
> > > > > > >>>> with a
> > > > > > >>>> > > > > > > > >    RepeatedBigIntHolder. I'm trying to
> figure
> > > out
> > > > > the
> > > > > > >>>> best
> > > > > > >>>> > way
> > > > > > >>>> > > to
> > > > > > >>>> > > > > > > create
> > > > > > >>>> > > > > > > > a
> > > > > > >>>> > > > > > > > > new
> > > > > > >>>> > > > > > > > >    one. I have not figured out where in the
> > > > existing
> > > > > > >>>> drill
> > > > > > >>>> > code
> > > > > > >>>> > > > > > someone
> > > > > > >>>> > > > > > > > > does
> > > > > > >>>> > > > > > > > >    this. If I use a  RepeatedBigIntHolder
> as a
> > > > > > Workspace
> > > > > > >>>> > object
> > > > > > >>>> > > > is
> > > > > > >>>> > > > > is
> > > > > > >>>> > > > > > > > null
> > > > > > >>>> > > > > > > > > to
> > > > > > >>>> > > > > > > > >    start with. I created a new one in the
> > > startup
> > > > > > >>>> section of
> > > > > > >>>> > > the
> > > > > > >>>> > > > > udf
> > > > > > >>>> > > > > > > but
> > > > > > >>>> > > > > > > > > the
> > > > > > >>>> > > > > > > > >    vector was null. I can find no reference
> in
> > > > > > creating
> > > > > > >>>> a new
> > > > > > >>>> > > > > > > > BigIntVector.
> > > > > > >>>> > > > > > > > >    There is a way to create a BigIntVector
> > and I
> > > > did
> > > > > > >>>> find an
> > > > > > >>>> > > > > example
> > > > > > >>>> > > > > > of
> > > > > > >>>> > > > > > > > >    creating a new VarCharVector but I can't
> do
> > > > that
> > > > > > >>>> using the
> > > > > > >>>> > > > drill
> > > > > > >>>> > > > > > jar
> > > > > > >>>> > > > > > > > > files
> > > > > > >>>> > > > > > > > >    from 1.0. The
> > > > > > >>>> org.apache.drill.common.types.TypeProtos and
> > > > > > >>>> > > > > > > > >    the
> > > > > > >>>> org.apache.drill.common.types.TypeProtos.MinorType
> > > > > > >>>> > > classes
> > > > > > >>>> > > > > do
> > > > > > >>>> > > > > > > not
> > > > > > >>>> > > > > > > > >    appear to be accessible from the drill
> jar
> > > > files.
> > > > > > >>>> > > > > > > > >    2. What is the best way to close out a
> UDF
> > in
> > > > the
> > > > > > >>>> event it
> > > > > > >>>> > > > > > generates
> > > > > > >>>> > > > > > > > an
> > > > > > >>>> > > > > > > > >    exception? Are there specific steps one
> > > should
> > > > > > >>>> follow to
> > > > > > >>>> > > make
> > > > > > >>>> > > > a
> > > > > > >>>> > > > > > > clean
> > > > > > >>>> > > > > > > > > exit
> > > > > > >>>> > > > > > > > >    in a catch block that are beneficial to
> > > Drill?
> > > > > > >>>> > > > > > > > >
> > > > > > >>>> > > > > > > >
> > > > > > >>>> > > > > > >
> > > > > > >>>> > > > > >
> > > > > > >>>> > > > >
> > > > > > >>>> > > >
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message