drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Bates <jba...@maprtech.com>
Subject Re: Some questions on UDFs
Date Sun, 05 Jul 2015 22:06:43 GMT
Just to close out this thread....

I got my final UDFs to work. I ended up with 2. One to create an array of
values and the other to calculate a simple linear regression. This data set
was a simple x = y slope

SELECT MyLinearRegression2(xValues,yValues,CAST(22356 as BIGINT)) as
xPerdict FROM (SELECT MyList(test_field1) as xValues, MyList(test_field2)
as yValues  FROM (SELECT test_field1,test_field2 FROM
`hive.default`.`my_hive_table` limit 10));
+-----------+
| xPerdict  |
+-----------+
| 22356.0   |
+-----------+


On Sun, Jul 5, 2015 at 4:10 PM, Jacques Nadeau <jacques@apache.org> wrote:

> You're right.  You're off the beaten path. I think everyone here would love
> to have more documentation and more comments. Of course, all of these take
> time.
>
> If you have time to volunteer to help improve these things, that would be
> great.
>
> With regards to the question about the jira, describe your use case and
> what functionality you couldn't find or make work. The active developers on
> the project can then do their best to help shape the Jira into better docs,
> javadocs and/or new functionality as time allows.
>
> On Jul 5, 2015 1:37 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> > Uh... actually, I think that it isn't obvious because there is absolutely
> > no documentation and there are no comments in the code.
> >
> > And what should the JIRA say?  We can't even tell what's missing, if
> > anything, because we can't tell how it is supposed to work.
> >
> >
> >
> >
> > On Sun, Jul 5, 2015 at 11:50 AM, Jacques Nadeau <jacques@apache.org>
> > wrote:
> >
> > > It isn't obvious because you shouldn't do it.  Please file a JIRA to
> add
> > > real support for this type of output.
> > >
> > > Your current function would leak large amounts of memory that would
> > > ultimately crash the node.
> > >
> > > Realistically, there are very few internal Drill APIs that you should
> > > access via a UDF (injectables, holders, complexwriter, fieldreader and
> > > helpers).  A post 1.0 goal was to provide a UDF interface JAR to ensure
> > > people don't accidentally reach into Drill's internals.  (A later
> > > possibility is bytecode weaving to completely protect against it).
> > >
> > > J
> > >
> > > On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > >
> > > > That was impressively non-obvious.
> > > >
> > > >
> > > >
> > > > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jbates@maprtech.com>
> wrote:
> > > >
> > > > > I did get a new RepeatedBigIntHolder built and added a BigIntVector
> > > added
> > > > > to it. I'll try it in the UDF tomorrow and see if there is a
> > difference
> > > > in
> > > > > the ways I found to get a BufferAllocator.
> > > > >
> > > > > .
> > > > > .
> > > > > .
> > > > > @Inject DrillBuf buffer;
> > > > > @Workspace RepeatedBigIntHolder yList;
> > > > > .
> > > > > .
> > > > > .
> > > > > @Override
> > > > > public void setup() {
> > > > > .
> > > > > .
> > > > > .
> > > > > //org.apache.drill.exec.memory.BufferAllocator allocator =
> > > > > buffer.getAllocator();
> > > > > org.apache.drill.exec.memory.BufferAllocator allocator =  new
> > > > > org.apache.drill.exec.memory.TopLevelAllocator();
> > > > > yList = new RepeatedBigIntHolder();
> > > > > yList.vector = new
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
> > > > > allocator);
> > > > > .
> > > > > .
> > > > > .
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jbates@maprtech.com>
> > wrote:
> > > > >
> > > > > > I still have issues finding the correct way to create and use
a
> > > > > > RepeatedHolder and Writers are a non starter for Workspace
> values.
> > I
> > > > can
> > > > > > make do with creating a concatenated string in a VarCharHolder
> for
> > > > small
> > > > > > data sets to get past this in the short term and finish testing
> the
> > > > > output
> > > > > > values I expect but won't be able to do any scale till I figure
> out
> > > how
> > > > > to
> > > > > > make a repeated list.
> > > > > >
> > > > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jbates@maprtech.com>
> > > wrote:
> > > > > >
> > > > > >> Well... Converting from string to integers anyway... To
many 4th
> > of
> > > > July
> > > > > >> Hot Dogs. going into nitrate overload. :)
> > > > > >>
> > > > > >> I am pulling an array of string values from json data. The
> string
> > > > values
> > > > > >> are actually integers. I am converting to integers and summing
> > each
> > > > > >> array entry to the final tally.
> > > > > >>
> > > > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jbates@maprtech.com>
> > > > wrote:
> > > > > >>
> > > > > >>> Ted,
> > > > > >>>
> > > > > >>> Yes, I started out just getting a basic count to work.
I am
> > trying
> > > to
> > > > > >>> keep the workflow as close to a basic user as possible.
As
> such,
> > I
> > > am
> > > > > >>> building and using the MapR Apache Drill sandbox to
test.
> > > > > >>>
> > > > > >>>
> > > > > >>>    1. Always look at the drillbits.log file to see if
drill had
> > any
> > > > > >>>    issues loading your UDF. That was where I learned
that all
> > > > > workspace values
> > > > > >>>    needed to be holders
> > > > > >>>       -
> > > > > >>>       - WARN  o.a.d.exec.expr.fn.FunctionConverter -
Failure
> > > loading
> > > > > >>>       function class
> > > > > >>>
> > > > >
> com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
> > > > field
> > > > > >>>       xList. Aggregate function 'MyLinearRegression1'
workspace
> > > > > variable 'xList'
> > > > > >>>       is of type 'interface
> > > > > >>>
> > > > >
> > org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
> > > > > >>>       Please change it to Holder type.
> > > > > >>>    2. Error messages:
> > > > > >>>       - If you get an error in this format it means
that Drill
> > can
> > > > not
> > > > > >>>       find your function so it probably didn't load
it. back to
> > > step
> > > > 1:
> > > > > >>>          -
> > > > > >>>          - PARSE ERROR: From line 1, column 8 to line
1, column
> > 44:
> > > > No
> > > > > >>>          match found for function signature
> MyFunctionName(<ANY>)
> > > > > >>>       - If you get an error in this format it means
that the
> > > function
> > > > > >>>       is there but Drill could not find a signature
that
> matched
> > > the
> > > > > param types
> > > > > >>>       or param numbers you were passing it. The exact
wording
> > will
> > > > > change but
> > > > > >>>       the Missing function implementation is the key
phrase to
> > look
> > > > > for:
> > > > > >>>          -
> > > > > >>>          - Error: SYSTEM ERROR:
> > > > > >>>          org.apache.drill.exec.exception.SchemaChangeException:
> > > > > Failure while trying
> > > > > >>>          to materialize incoming schema.  Errors:
> > > > > >>>          - Error in expression at index -1.  Error:
Missing
> > > function
> > > > > >>>          implementation: [castBIGINT(VARCHAR-REPEATED)].
 Full
> > > > > expression: --UNKNOWN
> > > > > >>>          EXPRESSION--
> > > > > >>>       3. In your function definition for aggregate functions
> you
> > > need
> > > > > >>>    to set null processing to internal and your isRandom
to
> false.
> > > > > Example
> > > > > >>>    below:
> > > > > >>>       -
> > > > > >>>       - @FunctionTemplate(name = "MyFunctionName", scope
=
> > > > > >>>       FunctionTemplate.FunctionScope.POINT_AGGREGATE,
nulls =
> > > > > >>>       FunctionTemplate.NullHandling.INTERNAL, isRandom
= false,
> > > > > >>>       isBinaryCommutative = false, costCategory =
> > > > > >>>       FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > > >>>
> > > > > >>> Below is an example from the Apache Drill tutorial data
sets
> > > > contained
> > > > > >>> in the MapR Apache Drill sandbox. I am pulling an array
if
> string
> > > > > values
> > > > > >>> from json data. The string values are actually integers.
I am
> > > > > converting to
> > > > > >>> string and summing each array entry to the final tally.
This in
> > no
> > > > way
> > > > > >>> represents what this data was for but it did become
a handy way
> > for
> > > > me
> > > > > to
> > > > > >>> peck out the "correct" way to build an aggregation UDF
function
> > > > > >>>
> > > > > >>> @FunctionTemplate(name = "MyArraySum", scope =
> > > > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls
=
> > > > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
> > > > > >>> isBinaryCommutative = false, costCategory =
> > > > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX)
> > > > > >>> public static class MyArraySum implements DrillAggFunc
{
> > > > > >>>
> > > > > >>> @Param RepeatedVarCharHolder listToSearch;
> > > > > >>> @Workspace NullableBigIntHolder count;
> > > > > >>> @Workspace NullableBigIntHolder sum;
> > > > > >>> @Workspace NullableVarCharHolder vc;
> > > > > >>> @Output BigIntHolder out;
> > > > > >>>
> > > > > >>> @Override
> > > > > >>> public void setup() {
> > > > > >>> count.value=0;
> > > > > >>> sum.value = 0;
> > > > > >>> }
> > > > > >>>
> > > > > >>> @Override
> > > > > >>> public void add() {
> > > > > >>> int c = listToSearch.end - listToSearch.start;
> > > > > >>> int val = 0;
> > > > > >>> try {
> > > > > >>> for(int i=0; i<c; i++){
> > > > > >>> listToSearch.vector.getAccessor().get(i, vc);
> > > > > >>> String inputStr =
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
> > > > > >>> vc.end, vc.buffer);
> > > > > >>> val = Integer.parseInt(inputStr);
> > > > > >>> sum.value = sum.value + val;
> > > > > >>> }
> > > > > >>> } catch (Exception e) {
> > > > > >>> val = 0;
> > > > > >>> }
> > > > > >>> count.value = count.value + 1;
> > > > > >>> }
> > > > > >>>
> > > > > >>> Example select statement:
> > > > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id
> as
> > > > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json`
t
> > limit
> > > > 5);
> > > > > >>>
> > > > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <
> > ted.dunning@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Jim,
> > > > > >>>>
> > > > > >>>> I think that you may be having trouble with aggregators
in
> > > general.
> > > > > >>>>
> > > > > >>>> Have you been able to build *any* aggregator of
anything?  I
> > > > haven't.
> > > > > >>>>
> > > > > >>>> When I try to build an aggregator of int's or doubles,
I get a
> > > very
> > > > > >>>> persistent problem with Drill even seeing my aggregates:
> > > > > >>>>
> > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id)
from
> > > > > >>>> cp.`employee.json`;*
> > > > > >>>>
> > > > > >>>> Jul 04, 2015 4:19:35 PM
> > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException
<init>
> > > > > >>>>
> > > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
> > No
> > > > > match
> > > > > >>>> found for function signature sum_int(<ANY>)
> > > > > >>>>
> > > > > >>>> Jul 04, 2015 4:19:35 PM
> > > org.apache.calcite.runtime.CalciteException
> > > > > >>>> <init>
> > > > > >>>>
> > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
> From
> > > > line
> > > > > 1,
> > > > > >>>> column 8 to line 1, column 27: No match found for
function
> > > signature
> > > > > >>>> sum_int(<ANY>)
> > > > > >>>>
> > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line
1, column
> 27:
> > > No
> > > > > >>>> match
> > > > > >>>> found for function signature sum_int(<ANY>)*
> > > > > >>>>
> > > > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145
on
> > > 10.0.1.2:31010
> > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > > >>>>
> > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id
as
> > int))
> > > > from
> > > > > >>>> cp.`employee.json`*;
> > > > > >>>>
> > > > > >>>> Jul 04, 2015 4:19:45 PM
> > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException
<init>
> > > > > >>>>
> > > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException:
> > No
> > > > > match
> > > > > >>>> found for function signature sum_int(<NUMERIC>)
> > > > > >>>>
> > > > > >>>> Jul 04, 2015 4:19:45 PM
> > > org.apache.calcite.runtime.CalciteException
> > > > > >>>> <init>
> > > > > >>>>
> > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException:
> From
> > > > line
> > > > > 1,
> > > > > >>>> column 8 to line 1, column 40: No match found for
function
> > > signature
> > > > > >>>> sum_int(<NUMERIC>)
> > > > > >>>>
> > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line
1, column
> 40:
> > > No
> > > > > >>>> match
> > > > > >>>> found for function signature sum_int(<NUMERIC>)*
> > > > > >>>>
> > > > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b
on
> > > 10.0.1.2:31010
> > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
> > > > > >>>>
> > > > > >>>> 0: jdbc:drill:zk=local>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> It looks like there is some undocumented subtlety
about how to
> > > > > register
> > > > > >>>> an
> > > > > >>>> aggregator.
> > > > > >>>>
> > > > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <
> jbates@maprtech.com>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>> > I'm working on the same thing. I want to aggregate
a list of
> > > > values.
> > > > > >>>> It has
> > > > > >>>> > been a search and guess game for the most part.
I'm still
> > stuck
> > > in
> > > > > the
> > > > > >>>> > process of getting the values all into a list.
The writers
> > look
> > > > > >>>> interesting
> > > > > >>>> > but for aggregation functions  it looks like
the input is
> the
> > > > param
> > > > > >>>> and
> > > > > >>>> > output objects can't hold the aggregations
steps. The
> > Workspace
> > > is
> > > > > >>>> where
> > > > > >>>> > that happens. If I try and use a Writer in
a workspace it
> > won't
> > > > load
> > > > > >>>> and
> > > > > >>>> > tells me to change it to Holders which was
why I was using
> > them
> > > to
> > > > > >>>> start
> > > > > >>>> > with. Maybe I'm missing the architecture of
the agg
> function.
> > It
> > > > > >>>> looked
> > > > > >>>> > like it was....
> > > > > >>>> >
> > > > > >>>> > @Param comes in -> initialize @Workspace
vars in setup ->
> > > process
> > > > > data
> > > > > >>>> > through @Workspace vars in add -> finalize
@Output in
> output.
> > > > > >>>> >
> > > > > >>>> > So I'm back to trying to figure out how to
create a
> > > > > >>>> RepeatedBigIntHolder or
> > > > > >>>> > a RepeatedVarCharHolder...
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning
<
> > > > ted.dunning@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>> >
> > > > > >>>> > > I am working on trying to build any kind
of list
> > constructing
> > > > > >>>> aggregator
> > > > > >>>> > > and having absolute fits.
> > > > > >>>> > >
> > > > > >>>> > > To simplify life, I decided to just build
a generic list
> > > builder
> > > > > >>>> that is
> > > > > >>>> > a
> > > > > >>>> > > scalar function that returns a list containing
its
> argument.
> > > > Thus
> > > > > >>>> > zoop(3)
> > > > > >>>> > > => [3], zoop('abc') => 'abc' and
zoop([1,2,3]) =>
> [[1,2,3]].
> > > > > >>>> > >
> > > > > >>>> > > The ComplexWriter looks like the place
to go. As usual,
> the
> > > > > >>>> complete lack
> > > > > >>>> > > of comments in most of Drill makes this
very hard since I
> > have
> > > > to
> > > > > >>>> guess
> > > > > >>>> > > what works and what doesn't.
> > > > > >>>> > >
> > > > > >>>> > > In my code, I note that ComplexWriter
has a nice
> > rootAsList()
> > > > > >>>> method.  I
> > > > > >>>> > > used this in zip and it works nicely to
construct lists
> for
> > > > > >>>> output.  I
> > > > > >>>> > note
> > > > > >>>> > > that the resulting ListWriter has a method
> > > > copyReader(FieldReader
> > > > > >>>> var1)
> > > > > >>>> > > which looks really good.
> > > > > >>>> > >
> > > > > >>>> > > Unfortunately, the only implementation
of copyReader() is
> in
> > > > > >>>> > > AbstractFieldWriter and it looks this:
> > > > > >>>> > >
> > > > > >>>> > > public void copyReader(FieldReader reader)
{
> > > > > >>>> > >     this.fail("Copy FieldReader");
> > > > > >>>> > > }
> > > > > >>>> > >
> > > > > >>>> > > I would like to formally say at this point
"WTF"?
> > > > > >>>> > >
> > > > > >>>> > > In digging in further, I see other methods
that look handy
> > > like
> > > > > >>>> > >
> > > > > >>>> > > public void write(IntHolder holder) {
> > > > > >>>> > >     this.fail("Int");
> > > > > >>>> > > }
> > > > > >>>> > >
> > > > > >>>> > > And then in looking at implementations,
it looks like
> there
> > > is a
> > > > > >>>> > > combinatorial explosion because every
type seems to need a
> > > write
> > > > > >>>> method
> > > > > >>>> > for
> > > > > >>>> > > every other type.
> > > > > >>>> > >
> > > > > >>>> > > What is the thought here?  How can I copy
an arbitrary
> value
> > > > into
> > > > > a
> > > > > >>>> list?
> > > > > >>>> > >
> > > > > >>>> > > My next thought was to build code that
dispatches on type.
> > > > There
> > > > > >>>> is a
> > > > > >>>> > > method called getType() on the FieldReader.
> Unfortunately,
> > > that
> > > > > >>>> drives
> > > > > >>>> > > into code generated by protoc and I see
no way to dispatch
> > on
> > > > the
> > > > > >>>> type of
> > > > > >>>> > > an incoming value.
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > How is this supposed to work?
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant
baid <
> > > > > baid.mehant@gmail.com>
> > > > > >>>> > wrote:
> > > > > >>>> > >
> > > > > >>>> > > > For a detailed example on using ComplexWriter
interface
> > you
> > > > can
> > > > > >>>> take a
> > > > > >>>> > > look
> > > > > >>>> > > > at the Mappify
> > > > > >>>> > > > <
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > > > > >>>> > > > >
> > > > > >>>> > > > (kvgen) function. The function itself
is very simple
> > however
> > > > it
> > > > > >>>> makes
> > > > > >>>> > use
> > > > > >>>> > > > of the utility methods in MappifyUtility
> > > > > >>>> > > > <
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > > > > >>>> > > > >
> > > > > >>>> > > > and MapUtility
> > > > > >>>> > > > <
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > > > > >>>> > > > >
> > > > > >>>> > > > which perform most of the work.
> > > > > >>>> > > >
> > > > > >>>> > > > Currently we don't have a generic
infrastructure to
> handle
> > > > > errors
> > > > > >>>> > coming
> > > > > >>>> > > > out of functions. However there is
UserException, which
> > when
> > > > > >>>> raised
> > > > > >>>> > will
> > > > > >>>> > > > make sure that Drill does not gobble
up the error
> message
> > in
> > > > > that
> > > > > >>>> > > > exception. So you can probably throw
a UserException
> with
> > > the
> > > > > >>>> failing
> > > > > >>>> > > input
> > > > > >>>> > > > in your function to make sure it
propagates to the user.
> > > > > >>>> > > >
> > > > > >>>> > > > Thanks
> > > > > >>>> > > > Mehant
> > > > > >>>> > > >
> > > > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques
Nadeau <
> > > > > >>>> jacques@apache.org>
> > > > > >>>> > > wrote:
> > > > > >>>> > > >
> > > > > >>>> > > > > *Holders are for both input
and output.  You can also
> > use
> > > > > >>>> > CompleWriter
> > > > > >>>> > > > for
> > > > > >>>> > > > > output and FieldReader for input
if you want to write
> or
> > > > read
> > > > > a
> > > > > >>>> > complex
> > > > > >>>> > > > > value.
> > > > > >>>> > > > >
> > > > > >>>> > > > > I don't think we've provided
a really clean way to
> > > > construct a
> > > > > >>>> > > > > Repeated*Holder for output purposes.
 You can probably
> > do
> > > it
> > > > > by
> > > > > >>>> > > reaching
> > > > > >>>> > > > > into a bunch of internal interfaces
in Drill.
> However,
> > I
> > > > > would
> > > > > >>>> > > recommend
> > > > > >>>> > > > > using the ComplexWriter output
pattern for now.  This
> > will
> > > > be
> > > > > a
> > > > > >>>> > little
> > > > > >>>> > > > less
> > > > > >>>> > > > > efficient but substantially
less brittle.  I suggest
> you
> > > > open
> > > > > >>>> up a
> > > > > >>>> > jira
> > > > > >>>> > > > for
> > > > > >>>> > > > > using a Repeated*Holder as an
output.
> > > > > >>>> > > > >
> > > > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38
PM, Ted Dunning <
> > > > > >>>> ted.dunning@gmail.com>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > > >
> > > > > >>>> > > > > > Holders are for input,
I think.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Try the different kinds
of writers.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > On Sat, Jul 4, 2015 at
12:49 PM, Jim Bates <
> > > > > >>>> jbates@maprtech.com>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > > Using a repeatedholder
as a @param I've got
> > working. I
> > > > was
> > > > > >>>> > working
> > > > > >>>> > > > on a
> > > > > >>>> > > > > > > custom aggregator
function using DrillAggFunc. In
> > > this I
> > > > > >>>> can do
> > > > > >>>> > > > simple
> > > > > >>>> > > > > > > things but If I want
to build a list values and do
> > > > > >>>> something with
> > > > > >>>> > > it
> > > > > >>>> > > > in
> > > > > >>>> > > > > > the
> > > > > >>>> > > > > > > final output method
I think I need to use
> > > > RepeatedHolders
> > > > > >>>> in the
> > > > > >>>> > > > > > > @Workspace. To do
that I need to create a new one
> in
> > > the
> > > > > >>>> setup
> > > > > >>>> > > > method.
> > > > > >>>> > > > > I
> > > > > >>>> > > > > > > can't get one built.
They all require a
> > > BufferAllocator
> > > > to
> > > > > >>>> be
> > > > > >>>> > > passed
> > > > > >>>> > > > in
> > > > > >>>> > > > > > to
> > > > > >>>> > > > > > > build it. I have not
found a way to get an
> allocator
> > > > yet.
> > > > > >>>> Any
> > > > > >>>> > > > > > suggestions?
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > On Sat, Jul 4, 2015
at 1:37 PM, Ted Dunning <
> > > > > >>>> > ted.dunning@gmail.com
> > > > > >>>> > > >
> > > > > >>>> > > > > > wrote:
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > > If you look at
the zip function in
> > > > > >>>> > > > > > > >
> > > https://github.com/mapr-demos/simple-drill-functions
> > > > > you
> > > > > >>>> can
> > > > > >>>> > > have
> > > > > >>>> > > > an
> > > > > >>>> > > > > > > > example of building
a structure.
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > The basic idea
is that your output is denoted as
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > >         @Output
> > > > > >>>> > > > > > > >         BaseWriter.ComplexWriter
writer;
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > The pattern for
building a list of lists of
> > integers
> > > > is
> > > > > >>>> like
> > > > > >>>> > > this:
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > >         writer.setValueCount(n);
> > > > > >>>> > > > > > > >         ...
> > > > > >>>> > > > > > > >         BaseWriter.ListWriter
outer =
> > > > > writer.rootAsList();
> > > > > >>>> > > > > > > >         outer.start();
// [ outer list
> > > > > >>>> > > > > > > >         ...
> > > > > >>>> > > > > > > >         // for
each inner list
> > > > > >>>> > > > > > > >             BaseWriter.ListWriter
inner =
> > > > outer.list();
> > > > > >>>> > > > > > > >             inner.start();
> > > > > >>>> > > > > > > >             //
for each inner list element
> > > > > >>>> > > > > > > >
> > > > >  inner.integer().writeInt(accessor.get(i));
> > > > > >>>> > > > > > > >             }
> > > > > >>>> > > > > > > >             inner.end();
  // ] inner list
> > > > > >>>> > > > > > > >         }
> > > > > >>>> > > > > > > >         outer.end();
// ] outer list
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > On Sat, Jul 4,
2015 at 10:29 AM, Jim Bates <
> > > > > >>>> > jbates@maprtech.com>
> > > > > >>>> > > > > > wrote:
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > > I have working
aggregation and simple UDFs.
> I've
> > > > been
> > > > > >>>> trying
> > > > > >>>> > to
> > > > > >>>> > > > > > > document
> > > > > >>>> > > > > > > > > and understand
each of the options available
> in
> > a
> > > > > Drill
> > > > > >>>> UDF.
> > > > > >>>> > > > > > > > Understanding
> > > > > >>>> > > > > > > > > the different
FunctionScope's, the ones that
> are
> > > > > >>>> allowed, the
> > > > > >>>> > > > ones
> > > > > >>>> > > > > > that
> > > > > >>>> > > > > > > > are
> > > > > >>>> > > > > > > > > not. The
impact of different cost categories.
> > The
> > > > > >>>> different
> > > > > >>>> > > > steps
> > > > > >>>> > > > > > > needed
> > > > > >>>> > > > > > > > > to understand
handling any of the supported
> data
> > > > types
> > > > > >>>> and
> > > > > >>>> > > > > > structures
> > > > > >>>> > > > > > > in
> > > > > >>>> > > > > > > > > drill.
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > Here are
a few of my current road blocks. Any
> > > > pointers
> > > > > >>>> would
> > > > > >>>> > be
> > > > > >>>> > > > > > greatly
> > > > > >>>> > > > > > > > > appreciated.
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >    1. I've
been trying to understand how to
> > > > correctly
> > > > > >>>> use
> > > > > >>>> > > > > > > RepeatedHolders
> > > > > >>>> > > > > > > > >    of whatever
type. For this discussion lets
> > > start
> > > > > >>>> with a
> > > > > >>>> > > > > > > > >    RepeatedBigIntHolder.
I'm trying to figure
> > out
> > > > the
> > > > > >>>> best
> > > > > >>>> > way
> > > > > >>>> > > to
> > > > > >>>> > > > > > > create
> > > > > >>>> > > > > > > > a
> > > > > >>>> > > > > > > > > new
> > > > > >>>> > > > > > > > >    one.
I have not figured out where in the
> > > existing
> > > > > >>>> drill
> > > > > >>>> > code
> > > > > >>>> > > > > > someone
> > > > > >>>> > > > > > > > > does
> > > > > >>>> > > > > > > > >    this.
If I use a  RepeatedBigIntHolder as a
> > > > > Workspace
> > > > > >>>> > object
> > > > > >>>> > > > is
> > > > > >>>> > > > > is
> > > > > >>>> > > > > > > > null
> > > > > >>>> > > > > > > > > to
> > > > > >>>> > > > > > > > >    start
with. I created a new one in the
> > startup
> > > > > >>>> section of
> > > > > >>>> > > the
> > > > > >>>> > > > > udf
> > > > > >>>> > > > > > > but
> > > > > >>>> > > > > > > > > the
> > > > > >>>> > > > > > > > >    vector
was null. I can find no reference in
> > > > > creating
> > > > > >>>> a new
> > > > > >>>> > > > > > > > BigIntVector.
> > > > > >>>> > > > > > > > >    There
is a way to create a BigIntVector
> and I
> > > did
> > > > > >>>> find an
> > > > > >>>> > > > > example
> > > > > >>>> > > > > > of
> > > > > >>>> > > > > > > > >    creating
a new VarCharVector but I can't do
> > > that
> > > > > >>>> using the
> > > > > >>>> > > > drill
> > > > > >>>> > > > > > jar
> > > > > >>>> > > > > > > > > files
> > > > > >>>> > > > > > > > >    from
1.0. The
> > > > > >>>> org.apache.drill.common.types.TypeProtos and
> > > > > >>>> > > > > > > > >    the
> > > > > >>>> org.apache.drill.common.types.TypeProtos.MinorType
> > > > > >>>> > > classes
> > > > > >>>> > > > > do
> > > > > >>>> > > > > > > not
> > > > > >>>> > > > > > > > >    appear
to be accessible from the drill jar
> > > files.
> > > > > >>>> > > > > > > > >    2. What
is the best way to close out a UDF
> in
> > > the
> > > > > >>>> event it
> > > > > >>>> > > > > > generates
> > > > > >>>> > > > > > > > an
> > > > > >>>> > > > > > > > >    exception?
Are there specific steps one
> > should
> > > > > >>>> follow to
> > > > > >>>> > > make
> > > > > >>>> > > > a
> > > > > >>>> > > > > > > clean
> > > > > >>>> > > > > > > > > exit
> > > > > >>>> > > > > > > > >    in a
catch block that are beneficial to
> > Drill?
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message