drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4776) Errata, questions about UDF documentation
Date Wed, 13 Jul 2016 00:38:20 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-4776:
-------------------------------
    Description: 
See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/

"Simple Function: A simple function operates on a single row and produces a single row as
the output."

Some explanation is needed. In SQL, the function accepts a single column and produces a new
column as output: SELECT myFunc( x ) FROM y; The example string and math functions are, in
fact, column (technically "scalar") functions.

Process, item 3: Explain why Drill needs the source files.

On this page: https://drill.apache.org/docs/developing-a-simple-function/

Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most folks,
but the user must replace the 1.1.0 with the version of Drill that is running on their cluster.

Step 3: it is not clear if the "bit holders" are parameters to a function or are member variables
into which values are injected. Some more background about the runtime flow would help answer
this question. That is, what does Drill do with the class? When is an instance created? How
are values passed in?

Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation to
help the user understand that these are overrides. Otherwise, these might be "magic method
names" (like "main"), so the user has to know to use exactly those names (and signatures).

Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit session?
Once per fragment? Once per row? How do we intend it to be used? (This method is described
on the aggregatess page, perhaps just say, "the setup method is described later in this tutorial.")

Step 5: "Verify that an empty drill-module.conf is included in the resources folder." This
is after the compile step. But, that file won't exist unless the user adds it to their source
tree. Should we include such a step, say after step 1? Also, note that in Step 2 of the previous
page, we say that the file must contain "drill.classpath.scanning.packages += "com.yourgroupidentifier.udf"".
Which is right?

Step 5: "add it to etc/drill/conf." This seems to be a vestige of an un-documented feature
that looks for drill configuration in that path. The more typical place is $DRILL_HOME/conf.
(Or, in Drill 1.8, $DRILL_SITE.) But, note, only ONE file of the name drill-module.conf can
exist. Since the file is intended to have module-specific config, the config dir is an awkard
place. Overall, this is probably just plain wrong.

On this page: https://drill.apache.org/docs/tutorial-develop-a-simple-function/

Step 1: Has same problem with naming old version of Drill. Update it to 1.7.0, or simply say
to include the user's own Drill version.

Step 3: We omit the import declaration for the Param annotation. Is it org.apache.drill.exec.expr.annotations.Param?

Step 4: As above, we need the import. org...Output? Also, Inject.

For above, when introducing a new annotation, explain that the annotations are described later
in the tutorial.

Also, we should provide a link to the Javadoc for the classes and annotations described here.
(Javadoc is the only way that a developer has to figure out the actual uses.) If we don't
have such Javadoc on Apache, we should add it.

Step 5: Again, when is setup( ) called? Once per Drillbit? Once per query? Once per method
call?

Step 5: The code is a bit strange. This is more than a doc issue; the whole code gen thing
needs thinking about. It will be very difficult to debug the function if part of it is code
generated only in the Drill server...

On this page: https://drill.apache.org/docs/adding-custom-functions-to-drill/

Change the following for Drill 1.8 or later:

Step 1, "copy them to <drill installation directory>/jars/3rdparty." Change to "copy
them to $DRILL_SITE/jars".

(The above change moves user jars out of the Drill distribution directory, making upgrades
much simpler.)

  was:
See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/

"Simple Function: A simple function operates on a single row and produces a single row as
the output."

Some explanation is needed. In SQL, the function accepts a single column and produces a new
column as output: SELECT myFunc( x ) FROM y; The example string and math functions are, in
fact, column (technically "scalar") functions.

Process, item 3: Explain why Drill needs the source files.

On this page: https://drill.apache.org/docs/developing-a-simple-function/

Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most folks,
but the user must replace the 1.1.0 with the version of Drill that is running on their cluster.

Step 3: it is not clear if the "bit holders" are parameters to a function or are member variables
into which values are injected. Some more background about the runtime flow would help answer
this question. That is, what does Drill do with the class? When is an instance created? How
are values passed in?

Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation to
help the user understand that these are overrides. Otherwise, these might be "magic method
names" (like "main"), so the user has to know to use exactly those names (and signatures).

Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit session?
Once per fragment? Once per row? How do we intend it to be used?





> Errata, questions about UDF documentation
> -----------------------------------------
>
>                 Key: DRILL-4776
>                 URL: https://issues.apache.org/jira/browse/DRILL-4776
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.7.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/
> "Simple Function: A simple function operates on a single row and produces a single row
as the output."
> Some explanation is needed. In SQL, the function accepts a single column and produces
a new column as output: SELECT myFunc( x ) FROM y; The example string and math functions are,
in fact, column (technically "scalar") functions.
> Process, item 3: Explain why Drill needs the source files.
> On this page: https://drill.apache.org/docs/developing-a-simple-function/
> Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most
folks, but the user must replace the 1.1.0 with the version of Drill that is running on their
cluster.
> Step 3: it is not clear if the "bit holders" are parameters to a function or are member
variables into which values are injected. Some more background about the runtime flow would
help answer this question. That is, what does Drill do with the class? When is an instance
created? How are values passed in?
> Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation
to help the user understand that these are overrides. Otherwise, these might be "magic method
names" (like "main"), so the user has to know to use exactly those names (and signatures).
> Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit
session? Once per fragment? Once per row? How do we intend it to be used? (This method is
described on the aggregatess page, perhaps just say, "the setup method is described later
in this tutorial.")
> Step 5: "Verify that an empty drill-module.conf is included in the resources folder."
This is after the compile step. But, that file won't exist unless the user adds it to their
source tree. Should we include such a step, say after step 1? Also, note that in Step 2 of
the previous page, we say that the file must contain "drill.classpath.scanning.packages +=
"com.yourgroupidentifier.udf"". Which is right?
> Step 5: "add it to etc/drill/conf." This seems to be a vestige of an un-documented feature
that looks for drill configuration in that path. The more typical place is $DRILL_HOME/conf.
(Or, in Drill 1.8, $DRILL_SITE.) But, note, only ONE file of the name drill-module.conf can
exist. Since the file is intended to have module-specific config, the config dir is an awkard
place. Overall, this is probably just plain wrong.
> On this page: https://drill.apache.org/docs/tutorial-develop-a-simple-function/
> Step 1: Has same problem with naming old version of Drill. Update it to 1.7.0, or simply
say to include the user's own Drill version.
> Step 3: We omit the import declaration for the Param annotation. Is it org.apache.drill.exec.expr.annotations.Param?
> Step 4: As above, we need the import. org...Output? Also, Inject.
> For above, when introducing a new annotation, explain that the annotations are described
later in the tutorial.
> Also, we should provide a link to the Javadoc for the classes and annotations described
here. (Javadoc is the only way that a developer has to figure out the actual uses.) If we
don't have such Javadoc on Apache, we should add it.
> Step 5: Again, when is setup( ) called? Once per Drillbit? Once per query? Once per method
call?
> Step 5: The code is a bit strange. This is more than a doc issue; the whole code gen
thing needs thinking about. It will be very difficult to debug the function if part of it
is code generated only in the Drill server...
> On this page: https://drill.apache.org/docs/adding-custom-functions-to-drill/
> Change the following for Drill 1.8 or later:
> Step 1, "copy them to <drill installation directory>/jars/3rdparty." Change to
"copy them to $DRILL_SITE/jars".
> (The above change moves user jars out of the Drill distribution directory, making upgrades
much simpler.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message