Creating the UDF

You create the JAR for a UDF to use in Drill in a conventional manner with a few caveats, using a unique name and creating a Drill resource, covered in this section.

Use a unique name for the Hive UDF to avoid conflicts with Drill custom functions of the same name.

Create a custom Hive UDF using either of these APIs:
+ +

Simple API: org.apache.hadoop.hive.ql.exec.UDF
Complex API: org.apache.hadoop.hive.ql.udf.generic.GenericUDF

Create an empty drill-module.conf in the resources directory in the Java project.

Export the logic to a JAR, including the drill-module.conf file in resources.

The drill-module.conf file defines startup options and makes the JAR functions available to use in queries throughout the Hadoop cluster. After exporting the UDF logic to a JAR file, set up the UDF in Drill. Drill users can access the custom UDF for use in Hive queries.

Setting Up a UDF

After you export the custom UDF as a JAR, perform the UDF setup tasks so Drill can access the UDF. The JAR needs to be available at query execution time as a session resource, so Drill queries can refer to the UDF by its name.

To set up the UDF:

In Drill 0.7 and later, add the JAR for the UDF to the Drill CLASSPATH. In earlier versions of Drill, place the JAR file in the /jars/3rdparty directory of the Drill installation on all nodes running a Drillbit.

On each Drill node in the cluster, restart the Drillbit. +<drill installation directory>/bin/drillbit.sh restart

Using a UDF

Use a Hive UDF just as you would use a Drill custom function. For example, to query using a Hive UDF named upper-to-lower that takes a column.value argument, the SELECT statement looks something like this:

 SELECT upper-to-lower(my_column.myvalue) FROM mytable;
+

+ + + + + + + + + + + + + + +

+ + + + + +

Deploying Drill in a Cluster

+ +

+ + + +

+ +

Overview

+ +

To run Drill in a clustered environment, complete the following steps:

+ +

Install Drill on each designated node in the cluster.
Configure a cluster ID and add Zookeeper information.
Connect Drill to your data sources.
Start Drill.

+ +

Prerequisites

+ +

Before you install Apache Drill on nodes in your cluster, you must have the +following software and services installed:

+ +

Oracle JDK version 7
Configured and running ZooKeeper quorum
Configured and running Hadoop cluster (Recommended)
DNS (Recommended)

+ +

Installing Drill

+ +

Complete the following steps to install Drill on designated nodes:

+ +

Download the Drill tarball.

curl http://getdrill.org/drill/download/apache-drill-0.8.0.tar.gz
+

Issue the following command to create a Drill installation directory and then explode the tarball to the directory:
+
```
mkdir /opt/drill
+tar xzf apache-drill-<version>.tar.gz --strip=1 -C /opt/drill
+
```
If you are using external JAR files, edit drill-env.sh,located in /opt/drill/conf/, and define HADOOP_HOME:
+
```
export HADOOP_HOME="~/hadoop/hadoop-0.20.2/"
+
```
In drill-override.conf,create a unique Drill cluster ID, and provide Zookeeper host names and port numbers to configure a connection to your Zookeeper quorum.
+ +
1. Edit drill-override.conflocated in ~/drill/drill-<version>/conf/.
2. Provide a unique cluster-id and the Zookeeper host names and port numbers in zk.connect. If you install Drill on multiple nodes, assign the same cluster ID to each Drill node so that all Drill nodes share the same ID. The default Zookeeper port is 2181.

+ +

Example

drill.exec: {
+  cluster-id: "<mydrillcluster>",
+  zk.connect: "<zkhostname1>:<port>,<zkhostname2>:<port>,<zkhostname3>:<port>",
+  debug.error_on_leak: false,
+  buffer.size: 6,
+  functions: ["org.apache.drill.expr.fn.impl", "org.apache.drill.udfs"]
+}
+

Connecting Drill to Data Sources

+ +

You can connect Drill to various types of data sources. Refer to Connect +Apache Drill to Data Sources to get configuration instructions for the +particular type of data source that you want to connect to Drill.

+ +

Starting Drill

+ +

Complete the following steps to start Drill:

+ +

Navigate to the Drill installation directory, and issue the following command to start a Drillbit:
+
```
bin/drillbit.sh restart
+
```
Issue the following command to invoke SQLLine and start Drill:
+
```
bin/sqlline -u jdbc:drill:
+
```
+
When connected, the Drill prompt appears.
+ Example: 0: jdbc:drill:zk=<zk1host>:<port>
+ +
If you cannot connect to Drill, invoke SQLLine with the ZooKeeper quorum:
+
```
 bin/sqlline -u jdbc:drill:zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port>
+
```
Issue the following query to Drill to verify that all Drillbits have joined the cluster:
+
```
0: jdbc:drill:zk=<zk1host>:<port> select * from sys.drillbits;
+
```

+ +

Drill provides a list of Drillbits that have joined.

+------------+------------+--------------+--------------------+
+|    host        | user_port    | control_port | data_port    |
++------------+------------+--------------+--------------------+
+| <host address> | <port number>| <port number>| <port number>|
++------------+------------+--------------+--------------------+
+