Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 003E2180F2 for ; Mon, 4 May 2015 19:26:35 +0000 (UTC) Received: (qmail 97135 invoked by uid 500); 4 May 2015 19:26:34 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 97062 invoked by uid 500); 4 May 2015 19:26:34 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 95974 invoked by uid 99); 4 May 2015 19:26:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 19:26:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 25C49E0979; Mon, 4 May 2015 19:26:34 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: tshiran@apache.org To: commits@drill.apache.org Date: Mon, 04 May 2015 19:26:55 -0000 Message-Id: <14daea48f72749f7b40f793f4e05aad7@git.apache.org> In-Reply-To: <4d522f2cbb3a4fb98f9eb392f0c84959@git.apache.org> References: <4d522f2cbb3a4fb98f9eb392f0c84959@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [23/51] [partial] drill-site git commit: Initial commit http://git-wip-us.apache.org/repos/asf/drill-site/blob/c4de0f83/docs/custom-function-interfaces/index.html ---------------------------------------------------------------------- diff --git a/docs/custom-function-interfaces/index.html b/docs/custom-function-interfaces/index.html new file mode 100644 index 0000000..c464985 --- /dev/null +++ b/docs/custom-function-interfaces/index.html @@ -0,0 +1,974 @@ + + + + + + + + + +Custom Function Interfaces - Apache Drill + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + +
+
+ + + + + +
+

Custom Function Interfaces

+ +
+ + + +
+ +

Implement the Drill interface appropriate for the type of function that you +want to develop. Each interface provides a set of required holders where you +input data types that your function uses and required methods that Drill calls +to perform your function’s operations.

+ +

Simple Function Interface

+ +

When you develop a simple function, you implement the DrillSimpleFunc interface. The name of the function is determined by the characters that you assign to the name variable. For example, the name for the following simple function is myaddints:

+
@FunctionTemplate(name = "myaddints", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
+public static class Add1 implements DrillSimpleFunc{
+
+

The nulls = NullHandling.NULL_IF_NULL variable tells Drill to return NULL values as NULL. For most scenarios, this setting should suffice. If you want to change how Drill handles NULL values, you can change the setting to nulls = NullHandling.INTERNAL.

+ +

The simple function interface includes holders where you indicate the data types that your function can process.

+ +

The following table provides a list of the holders and their descriptions, with examples:

+ +
HolderDescriptionExample
@ParamIndicates the data type that the function processes as input and determines the number of parameters that your function accepts within the query.@Param BigIntHolder input1;
@Param BigIntHolder input2;
@OutputIndicates the data type that the processing returns.@Output BigIntHolder out;
+ +

The simple function interface also includes two required methods that Drill calls when processing a query with the function.

+ +

The following table lists the required methods:

+ +
MethodDescription
setup()Performs the initialization and processing that Drill only performs once.
eval()Contains the code that tells Drill what operations to perform on columns of data. You add your custom code to this method.
+ +

Example

+ +

The following example shows the program created for the myaddints function:

+
package org.apache.drill.udfs;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.record.RecordBatch;
+
+public class MyUdfs {
+
+  @FunctionTemplate(name = "myaddints", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
+  public static class Add1 implements DrillSimpleFunc{
+
+    @Param BigIntHolder input1;
+    @Param BigIntHolder input2;
+    @Output BigIntHolder out;
+    public void setup(RecordBatch b){}
+
+    public void eval(){
+      out.value = input1.value + input2.value;
+    }
+  }
+
+

Aggregate Function Interface

+ +

When you develop an aggregate function, you implement the DrillAggFunc interface. The name of the function is determined by the characters that you assign to the name variable. For example, the name for the following aggregate function is mysecondmin:

+
@FunctionTemplate(name = "mysecondmin", scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+public static class MySecondMin implements DrillAggFunc {
+
+

The aggregate function interface includes holders where you indicate the data types that your function can process.

+ +

The following table provides a list of the holders and their descriptions, with examples:

+ +
HolderDescriptionExample
@ParamIndicates the data type that the function processes as input and determines the number of parameters that your function accepts within the query.@Param BigIntHolder in;
@WorkspaceIndicates the data type used to store intermediate data during processing.@Workspace BigIntHolder min;
@Workspace BigIntHolder secondMin;
@OutputIndicates the data type that the processing returns.@Output BigIntHolder out;
+ +

The aggregate function interface also includes four required methods that Drill calls when processing a query with the function.

+ +

The following table lists the required methods:

+ +
MethodDescription
setup()Performs the initialization and processing that Drill only performs once.
add()Processes each and every record. It applies the function to each value in a column that Drill processes.
output()Returns the final result of the aggregate function; the computed value of the processing applied by the Add method. This is the last method that Drill calls. Drill calls this one time after processing all the records.
reset()You provide the code in this method that determines the action Drill takes when data types in a column change from one type to another, for example from int to float. Before processing schema-less data, Drill scans the d ata and implicitly tries to identify the data type associated with each column. If Drill cannot identify a schema associated with each column, Drill processes a column assuming that the column contains a certain data type. If Drill encounters another data type in the column, Drill calls the reset method to determine how to handle the scenario.
+ +

Example

+ +

The following example shows the program created for the mysecondmin function:

+
package org.apache.drill.udfs;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.record.RecordBatch;
+
+public class MyUdfs {
+
+  @FunctionTemplate(name = "mysecondmin", scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class MySecondMin implements DrillAggFunc {
+    @Param BigIntHolder in;
+    @Workspace BigIntHolder min;
+    @Workspace BigIntHolder secondMin;
+    @Output BigIntHolder out;
+    public void setup(RecordBatch b) {
+        min = new BigIntHolder(); 
+        secondMin = new BigIntHolder(); 
+      min.value = 999999999;
+      secondMin.value = 999999999;
+    }
+
+    @Override
+    public void add() {
+
+        if (in.value < min.value) {
+            min.value = in.value;
+            secondMin.value = min.value;
+        }
+
+    }
+    @Override
+    public void output() {
+      out.value = secondMin.value;
+    }
+    @Override
+    public void reset() {
+      min.value = 0;
+      secondMin.value = 0;
+    }
+
+   }
+
+ + + + + +
+
+
+ +
+

+ + + + + + http://git-wip-us.apache.org/repos/asf/drill-site/blob/c4de0f83/docs/data-sources-and-file-formats-introduction/index.html ---------------------------------------------------------------------- diff --git a/docs/data-sources-and-file-formats-introduction/index.html b/docs/data-sources-and-file-formats-introduction/index.html new file mode 100644 index 0000000..aeca3eb --- /dev/null +++ b/docs/data-sources-and-file-formats-introduction/index.html @@ -0,0 +1,863 @@ + + + + + + + + + +Data Sources and File Formats Introduction - Apache Drill + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + +
+
+ + + + + +
+

Data Sources and File Formats Introduction

+ +
+ + + +
+ +

Included in the data sources that Drill supports are these key data sources:

+ +
    +
  • HBase
  • +
  • Hive
  • +
  • MapR-DB
  • +
  • File system
  • +
+ +

Drill supports the following input formats for data:

+ +
    +
  • CSV (Comma-Separated-Values)
  • +
  • TSV (Tab-Separated-Values)
  • +
  • PSV (Pipe-Separated-Values)
  • +
  • Parquet
  • +
  • JSON
  • +
+ +

You set the input format for data coming from data sources to Drill in the workspace portion of the storage plugin definition. The default input format in Drill is Parquet.

+ +

You change the sys.options table to set the output format of Drill data. The default storage format for Drill CREATE TABLE AS (CTAS) statements is Parquet.

+ + + + + + +
+
+
+ +
+

+ + + + + +