Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCB72180EE for ; Mon, 4 May 2015 19:26:34 +0000 (UTC) Received: (qmail 96953 invoked by uid 500); 4 May 2015 19:26:34 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 96875 invoked by uid 500); 4 May 2015 19:26:34 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 95953 invoked by uid 99); 4 May 2015 19:26:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 19:26:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 110BDE0A05; Mon, 4 May 2015 19:26:34 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: tshiran@apache.org To: commits@drill.apache.org Date: Mon, 04 May 2015 19:26:50 -0000 Message-Id: <2299c4d4249f44adacf24ac009879d22@git.apache.org> In-Reply-To: <4d522f2cbb3a4fb98f9eb392f0c84959@git.apache.org> References: <4d522f2cbb3a4fb98f9eb392f0c84959@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [18/51] [partial] drill-site git commit: Initial commit http://git-wip-us.apache.org/repos/asf/drill-site/blob/c4de0f83/docs/deploying-and-using-a-hive-udf/index.html ---------------------------------------------------------------------- diff --git a/docs/deploying-and-using-a-hive-udf/index.html b/docs/deploying-and-using-a-hive-udf/index.html new file mode 100644 index 0000000..0796e8a --- /dev/null +++ b/docs/deploying-and-using-a-hive-udf/index.html @@ -0,0 +1,878 @@ + + + + + + + + + +Deploying and Using a Hive UDF - Apache Drill + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + +
+
+ + + + + +
+

Deploying and Using a Hive UDF

+ +
+ + + +
+ +

If the extensive Hive functions, such as the mathematical and date functions, which Drill supports do not meet your needs, you can use a Hive UDF in Drill queries. Drill supports your existing Hive scalar UDFs. You can do queries on Hive tables and access existing Hive input/output formats, including custom serdes. Drill serves as a complement to Hive deployments by offering low latency queries.

+ +

Creating the UDF

+ +

You create the JAR for a UDF to use in Drill in a conventional manner with a few caveats, using a unique name and creating a Drill resource, covered in this section.

+ +
    +
  1. Use a unique name for the Hive UDF to avoid conflicts with Drill custom functions of the same name.
  2. +
  3. Create a custom Hive UDF using either of these APIs:
    + +
      +
    • Simple API: org.apache.hadoop.hive.ql.exec.UDF
    • +
    • Complex API: org.apache.hadoop.hive.ql.udf.generic.GenericUDF
    • +
  4. +
  5. Create an empty drill-module.conf in the resources directory in the Java project.
  6. +
  7. Export the logic to a JAR, including the drill-module.conf file in resources.
  8. +
+ +

The drill-module.conf file defines startup options and makes the JAR functions available to use in queries throughout the Hadoop cluster. After exporting the UDF logic to a JAR file, set up the UDF in Drill. Drill users can access the custom UDF for use in Hive queries.

+ +

Setting Up a UDF

+ +

After you export the custom UDF as a JAR, perform the UDF setup tasks so Drill can access the UDF. The JAR needs to be available at query execution time as a session resource, so Drill queries can refer to the UDF by its name.

+ +

To set up the UDF:

+ +
    +
  1. Register Hive. Register a Hive storage plugin that connects Drill to a Hive data source.
  2. +
  3. In Drill 0.7 and later, add the JAR for the UDF to the Drill CLASSPATH. In earlier versions of Drill, place the JAR file in the /jars/3rdparty directory of the Drill installation on all nodes running a Drillbit.
  4. +
  5. On each Drill node in the cluster, restart the Drillbit. +<drill installation directory>/bin/drillbit.sh restart
  6. +
+ +

Using a UDF

+ +

Use a Hive UDF just as you would use a Drill custom function. For example, to query using a Hive UDF named upper-to-lower that takes a column.value argument, the SELECT statement looks something like this:

+
 SELECT upper-to-lower(my_column.myvalue) FROM mytable;
+
+ + + + + +
+
+
+ +
+

+ + + + + + http://git-wip-us.apache.org/repos/asf/drill-site/blob/c4de0f83/docs/deploying-drill-in-a-cluster/index.html ---------------------------------------------------------------------- diff --git a/docs/deploying-drill-in-a-cluster/index.html b/docs/deploying-drill-in-a-cluster/index.html new file mode 100644 index 0000000..1d25b9f --- /dev/null +++ b/docs/deploying-drill-in-a-cluster/index.html @@ -0,0 +1,935 @@ + + + + + + + + + +Deploying Drill in a Cluster - Apache Drill + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + +
+
+ + + + + +
+

Deploying Drill in a Cluster

+ +
+ + + +
+ +

Overview

+ +

To run Drill in a clustered environment, complete the following steps:

+ +
    +
  1. Install Drill on each designated node in the cluster.
  2. +
  3. Configure a cluster ID and add Zookeeper information.
  4. +
  5. Connect Drill to your data sources.
  6. +
  7. Start Drill.
  8. +
+ +

Prerequisites

+ +

Before you install Apache Drill on nodes in your cluster, you must have the +following software and services installed:

+ +
    +
  • Oracle JDK version 7
  • +
  • Configured and running ZooKeeper quorum
  • +
  • Configured and running Hadoop cluster (Recommended)
  • +
  • DNS (Recommended)
  • +
+ +

Installing Drill

+ +

Complete the following steps to install Drill on designated nodes:

+ +
    +
  1. Download the Drill tarball.

    +
    curl http://getdrill.org/drill/download/apache-drill-0.8.0.tar.gz
    +
  2. +
  3. Issue the following command to create a Drill installation directory and then explode the tarball to the directory:

    +
    mkdir /opt/drill
    +tar xzf apache-drill-<version>.tar.gz --strip=1 -C /opt/drill
    +
  4. +
  5. If you are using external JAR files, edit drill-env.sh,located in /opt/drill/conf/, and define HADOOP_HOME:

    +
    export HADOOP_HOME="~/hadoop/hadoop-0.20.2/"
    +
  6. +
  7. In drill-override.conf,create a unique Drill cluster ID, and provide Zookeeper host names and port numbers to configure a connection to your Zookeeper quorum.

    + +
      +
    1. Edit drill-override.conflocated in ~/drill/drill-<version>/conf/.
    2. +
    3. Provide a unique cluster-id and the Zookeeper host names and port numbers in zk.connect. If you install Drill on multiple nodes, assign the same cluster ID to each Drill node so that all Drill nodes share the same ID. The default Zookeeper port is 2181.
    4. +
  8. +
+ +

Example

+
drill.exec: {
+  cluster-id: "<mydrillcluster>",
+  zk.connect: "<zkhostname1>:<port>,<zkhostname2>:<port>,<zkhostname3>:<port>",
+  debug.error_on_leak: false,
+  buffer.size: 6,
+  functions: ["org.apache.drill.expr.fn.impl", "org.apache.drill.udfs"]
+}
+
+

Connecting Drill to Data Sources

+ +

You can connect Drill to various types of data sources. Refer to Connect +Apache Drill to Data Sources to get configuration instructions for the +particular type of data source that you want to connect to Drill.

+ +

Starting Drill

+ +

Complete the following steps to start Drill:

+ +
    +
  1. Navigate to the Drill installation directory, and issue the following command to start a Drillbit:

    +
    bin/drillbit.sh restart
    +
  2. +
  3. Issue the following command to invoke SQLLine and start Drill:

    +
    bin/sqlline -u jdbc:drill:
    +
    +

    When connected, the Drill prompt appears.
    + Example: 0: jdbc:drill:zk=<zk1host>:<port>

    + +

    If you cannot connect to Drill, invoke SQLLine with the ZooKeeper quorum:

    +
     bin/sqlline -u jdbc:drill:zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port>
    +
  4. +
  5. Issue the following query to Drill to verify that all Drillbits have joined the cluster:

    +
    0: jdbc:drill:zk=<zk1host>:<port> select * from sys.drillbits;
    +
  6. +
+ +

Drill provides a list of Drillbits that have joined.

+
+------------+------------+--------------+--------------------+
+|    host        | user_port    | control_port | data_port    |
++------------+------------+--------------+--------------------+
+| <host address> | <port number>| <port number>| <port number>|
++------------+------------+--------------+--------------------+
+
+

Example

+ +

Now you can query data with Drill. The Drill installation includes sample data +that you can query. Refer to Querying Parquet Files.

+ + + + + + +
+
+
+ +
+

+ + + + + +