Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD5D017349 for ; Thu, 15 Jan 2015 05:05:29 +0000 (UTC) Received: (qmail 77206 invoked by uid 500); 15 Jan 2015 05:05:26 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 77181 invoked by uid 500); 15 Jan 2015 05:05:26 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 77118 invoked by uid 99); 15 Jan 2015 05:05:26 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2015 05:05:26 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 180B2A437BC; Thu, 15 Jan 2015 05:05:26 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: tshiran@apache.org To: commits@drill.apache.org Date: Thu, 15 Jan 2015 05:05:30 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [6/6] drill git commit: Added Drill docs Added Drill docs Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/84b7b36d Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/84b7b36d Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/84b7b36d Branch: refs/heads/gh-pages Commit: 84b7b36d96875c19cef3bd3395a26e9f129c5d53 Parents: c37bc59 Author: Tomer Shiran Authored: Wed Jan 14 21:05:10 2015 -0800 Committer: Tomer Shiran Committed: Wed Jan 14 21:05:10 2015 -0800 ---------------------------------------------------------------------- _config.yml | 3 + _docs/001-drill-docs.md | 4 + _docs/001-user-guide.md | 4 - _docs/002-admin-guide.md | 5 - _docs/drill-docs/001-arch.md | 58 +++ _docs/drill-docs/002-tutorial.md | 58 +++ _docs/drill-docs/003-yelp.md | 402 ++++++++++++++++++ _docs/drill-docs/004-install.md | 20 + _docs/drill-docs/005-connect.md | 49 +++ _docs/drill-docs/006-query.md | 57 +++ _docs/drill-docs/006-sql-ref.md | 25 ++ _docs/drill-docs/007-dev-custom-func.md | 47 +++ _docs/drill-docs/008-manage.md | 23 + _docs/drill-docs/009-develop.md | 16 + _docs/drill-docs/010-rn.md | 192 +++++++++ _docs/drill-docs/011-contribute.md | 11 + _docs/drill-docs/012-sample-ds.md | 11 + _docs/drill-docs/013-design.md | 14 + _docs/drill-docs/014-progress.md | 9 + _docs/drill-docs/015-archived-pages.md | 9 + _docs/drill-docs/016-bylaws.md | 171 ++++++++ _docs/drill-docs/arch/001-core-mod.md | 30 ++ _docs/drill-docs/arch/002-arch-hilite.md | 15 + .../arch/arch-hilite/001-flexibility.md | 79 ++++ .../arch/arch-hilite/002-performance.md | 56 +++ _docs/drill-docs/archive/001-how-to-demo.md | 309 ++++++++++++++ _docs/drill-docs/archive/002-meet-drill.md | 41 ++ _docs/drill-docs/connect/001-plugin-reg.md | 39 ++ _docs/drill-docs/connect/002-mongo-plugin.md | 169 ++++++++ _docs/drill-docs/connect/003-mapr-db-plugin.md | 32 ++ .../connect/workspaces/001-workspaces.md | 82 ++++ .../drill-docs/connect/workspaces/002-reg-fs.md | 69 +++ .../connect/workspaces/003-reg-hbase.md | 34 ++ .../connect/workspaces/004-reg-hive.md | 99 +++++ .../connect/workspaces/005-default-frmt.md | 61 +++ _docs/drill-docs/contribute/001-guidelines.md | 230 ++++++++++ _docs/drill-docs/contribute/002-ideas.md | 158 +++++++ _docs/drill-docs/datasets/001-aol.md | 47 +++ _docs/drill-docs/datasets/002-enron.md | 21 + _docs/drill-docs/datasets/003-wikipedia.md | 105 +++++ _docs/drill-docs/design/001-plan.md | 25 ++ _docs/drill-docs/design/002-rpc.md | 19 + _docs/drill-docs/design/003-query-stages.md | 42 ++ _docs/drill-docs/design/004-research.md | 48 +++ _docs/drill-docs/design/005-value.md | 191 +++++++++ .../drill-docs/dev-custom-fcn/001-dev-simple.md | 51 +++ .../dev-custom-fcn/002-dev-aggregate.md | 59 +++ .../drill-docs/dev-custom-fcn/003-add-custom.md | 28 ++ .../drill-docs/dev-custom-fcn/004-use-custom.md | 55 +++ .../dev-custom-fcn/005-cust-interface.md | 14 + _docs/drill-docs/develop/001-compile.md | 37 ++ _docs/drill-docs/develop/002-setup.md | 5 + _docs/drill-docs/develop/003-patch-tool.md | 160 +++++++ _docs/drill-docs/install/001-drill-in-10.md | 395 +++++++++++++++++ _docs/drill-docs/install/002-deploy.md | 102 +++++ .../drill-docs/install/003-install-embedded.md | 30 ++ .../install/004-install-distributed.md | 61 +++ .../install-embedded/001-install-linux.md | 30 ++ .../install/install-embedded/002-install-mac.md | 33 ++ .../install/install-embedded/003-install-win.md | 57 +++ _docs/drill-docs/manage/001-conf.md | 20 + _docs/drill-docs/manage/002-start-stop.md | 45 ++ _docs/drill-docs/manage/003-ports.md | 9 + _docs/drill-docs/manage/004-partition-prune.md | 75 ++++ _docs/drill-docs/manage/005-monitor-cancel.md | 30 ++ _docs/drill-docs/manage/conf/001-mem-alloc.md | 31 ++ _docs/drill-docs/manage/conf/002-startup-opt.md | 50 +++ _docs/drill-docs/manage/conf/003-plan-exec.md | 37 ++ .../drill-docs/manage/conf/004-persist-conf.md | 93 ++++ _docs/drill-docs/progress/001-2014-q1.md | 204 +++++++++ _docs/drill-docs/query/001-query-fs.md | 44 ++ _docs/drill-docs/query/002-query-hbase.md | 177 ++++++++ _docs/drill-docs/query/003-query-hive.md | 67 +++ _docs/drill-docs/query/004-query-complex.md | 63 +++ _docs/drill-docs/query/005-query-info-skema.md | 109 +++++ _docs/drill-docs/query/006-query-sys-tbl.md | 176 ++++++++ _docs/drill-docs/query/007-interfaces.md | 16 + _docs/drill-docs/query/interfaces/001-jdbc.md | 138 ++++++ _docs/drill-docs/query/interfaces/002-odbc.md | 23 + .../query/query-complex/001-sample-donuts.md | 40 ++ .../query/query-complex/002-query1-select.md | 19 + .../query/query-complex/003-query2-use-sql.md | 74 ++++ .../query/query-complex/004-query3-sel-nest.md | 50 +++ .../query-complex/005-query4-sel-multiple.md | 24 ++ .../drill-docs/query/query-fs/001-query-json.md | 41 ++ .../query/query-fs/002-query-parquet.md | 99 +++++ .../drill-docs/query/query-fs/003-query-text.md | 120 ++++++ .../drill-docs/query/query-fs/004-query-dir.md | 90 ++++ _docs/drill-docs/rn/001-0.5.0rn.md | 29 ++ _docs/drill-docs/rn/002-0.4.0rn.md | 42 ++ _docs/drill-docs/rn/003-alpha-rn.md | 44 ++ _docs/drill-docs/rn/004-0.6.0-rn.md | 32 ++ _docs/drill-docs/rn/005-0.7.0-rn.md | 56 +++ _docs/drill-docs/sql-ref/001-data-types.md | 96 +++++ _docs/drill-docs/sql-ref/002-operators.md | 71 ++++ _docs/drill-docs/sql-ref/003-functions.md | 185 ++++++++ _docs/drill-docs/sql-ref/004-nest-functions.md | 10 + _docs/drill-docs/sql-ref/005-cmd-summary.md | 16 + _docs/drill-docs/sql-ref/006-reserved-wds.md | 16 + .../sql-ref/cmd-summary/001-create-table-as.md | 134 ++++++ .../sql-ref/cmd-summary/002-explain.md | 166 ++++++++ .../sql-ref/cmd-summary/003-select.md | 85 ++++ .../sql-ref/cmd-summary/004-show-files.md | 65 +++ _docs/drill-docs/sql-ref/data-types/001-date.md | 148 +++++++ _docs/drill-docs/sql-ref/nested/001-flatten.md | 89 ++++ _docs/drill-docs/sql-ref/nested/002-kvgen.md | 150 +++++++ .../sql-ref/nested/003-repeated-cnt.md | 34 ++ .../drill-docs/tutorial/001-install-sandbox.md | 56 +++ _docs/drill-docs/tutorial/002-get2kno-sb.md | 235 +++++++++++ _docs/drill-docs/tutorial/003-lesson1.md | 423 +++++++++++++++++++ _docs/drill-docs/tutorial/004-lesson2.md | 392 +++++++++++++++++ _docs/drill-docs/tutorial/005-lesson3.md | 379 +++++++++++++++++ _docs/drill-docs/tutorial/006-summary.md | 14 + .../install-sandbox/001-install-mapr-vm.md | 55 +++ .../install-sandbox/002-install-mapr-vb.md | 72 ++++ _docs/img/11.png | Bin 0 -> 5224 bytes _docs/img/18.png | Bin 0 -> 22253 bytes _docs/img/19.png | Bin 0 -> 22248 bytes _docs/img/21.png | Bin 0 -> 23385 bytes _docs/img/30.png | Bin 0 -> 50936 bytes _docs/img/4.png | Bin 0 -> 41555 bytes _docs/img/40.png | Bin 0 -> 25898 bytes _docs/img/42.png | Bin 0 -> 153938 bytes _docs/img/46.png | Bin 0 -> 3597 bytes _docs/img/51.png | Bin 0 -> 64520 bytes _docs/img/52.png | Bin 0 -> 21243 bytes _docs/img/53.png | Bin 0 -> 52269 bytes _docs/img/54.png | Bin 0 -> 10704 bytes _docs/img/7.png | Bin 0 -> 30755 bytes _docs/img/DrillWebUI.png | Bin 0 -> 53187 bytes _docs/img/DrillbitModules.png | Bin 0 -> 54907 bytes _docs/img/Overview.png | Bin 0 -> 165981 bytes _docs/img/StoragePluginConfig.png | Bin 0 -> 20403 bytes _docs/img/drill-runtime.png | Bin 0 -> 78592 bytes _docs/img/drill2.png | Bin 0 -> 22806 bytes _docs/img/example_query.png | Bin 0 -> 63294 bytes _docs/img/loginSandBox.png | Bin 0 -> 53970 bytes _docs/img/queryFlow.png | Bin 0 -> 37031 bytes _docs/img/slide-15-638.png | Bin 0 -> 58477 bytes _docs/img/storageplugin.png | Bin 0 -> 7589 bytes _docs/img/value1.png | Bin 0 -> 10017 bytes _docs/img/value2.png | Bin 0 -> 13935 bytes _docs/img/value3.png | Bin 0 -> 14694 bytes _docs/img/value4.png | Bin 0 -> 20297 bytes _docs/img/value5.png | Bin 0 -> 44110 bytes _docs/img/value6.png | Bin 0 -> 13854 bytes _docs/img/value7.png | Bin 0 -> 21719 bytes _docs/img/vbApplSettings.png | Bin 0 -> 45140 bytes _docs/img/vbEthernet.png | Bin 0 -> 31861 bytes _docs/img/vbGenSettings.png | Bin 0 -> 56436 bytes _docs/img/vbImport.png | Bin 0 -> 29075 bytes _docs/img/vbMaprSetting.png | Bin 0 -> 56436 bytes _docs/img/vbNetwork.png | Bin 0 -> 32117 bytes _docs/img/vbloginSandBox.png | Bin 0 -> 52169 bytes _docs/img/vmLibrary.png | Bin 0 -> 68085 bytes _docs/img/vmShare.png | Bin 0 -> 49069 bytes _docs/img/vmWelcome.png | Bin 0 -> 39255 bytes _docs/user-guide/001-views.md | 5 - _docs/user-guide/002-sql-syntax.md | 5 - _docs/user-guide/sql-syntax/001-ddd-ddd.md | 7 - _docs/user-guide/views/001-aaa-aaa.md | 7 - _docs/user-guide/views/002-bbb-bbb.md | 7 - _docs/user-guide/views/003-ccc-ccc.md | 7 - 163 files changed, 9355 insertions(+), 47 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_config.yml ---------------------------------------------------------------------- diff --git a/_config.yml b/_config.yml index aeba641..3b254bd 100644 --- a/_config.yml +++ b/_config.yml @@ -13,6 +13,9 @@ baseurl: "/drill" # Base URL when hosted on GitHub Pages (drill repository under noindex: 1 markdown: redcarpet +redcarpet: + extensions: ["no_intra_emphasis", "fenced_code_blocks", "autolink", "tables", "with_toc_data"] + collections: docs: output: true http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/001-drill-docs.md ---------------------------------------------------------------------- diff --git a/_docs/001-drill-docs.md b/_docs/001-drill-docs.md new file mode 100644 index 0000000..382e2e1 --- /dev/null +++ b/_docs/001-drill-docs.md @@ -0,0 +1,4 @@ +--- +title: "Apache Drill Documentation" +--- +The Drill documentation covers how to install, configure, and use Apache Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/001-user-guide.md ---------------------------------------------------------------------- diff --git a/_docs/001-user-guide.md b/_docs/001-user-guide.md deleted file mode 100644 index be161df..0000000 --- a/_docs/001-user-guide.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -title: "User Guide" ---- -This is a user guide! \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/002-admin-guide.md ---------------------------------------------------------------------- diff --git a/_docs/002-admin-guide.md b/_docs/002-admin-guide.md deleted file mode 100644 index ad1ce75..0000000 --- a/_docs/002-admin-guide.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -title: "Admin Guide" -nocontent: true ---- -This is an Admin Guide... \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/001-arch.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/001-arch.md b/_docs/drill-docs/001-arch.md new file mode 100644 index 0000000..e4b26fc --- /dev/null +++ b/_docs/drill-docs/001-arch.md @@ -0,0 +1,58 @@ +--- +title: "Architectural Overview" +parent: "Apache Drill Documentation" +--- +Apache Drill is a low latency distributed query engine for large-scale +datasets, including structured and semi-structured/nested data. Inspired by +Google’s Dremel, Drill is designed to scale to several thousands of nodes and +query petabytes of data at interactive speeds that BI/Analytics environments +require. + +### High-Level Architecture + +Drill includes a distributed execution environment, purpose built for large- +scale data processing. At the core of Apache Drill is the ‘Drillbit’ service, +which is responsible for accepting requests from the client, processing the +queries, and returning results to the client. + +A Drillbit service can be installed and run on all of the required nodes in a +Hadoop cluster to form a distributed cluster environment. When a Drillbit runs +on each data node in the cluster, Drill can maximize data locality during +query execution without moving data over the network or between nodes. Drill +uses ZooKeeper to maintain cluster membership and health-check information. + +Though Drill works in a Hadoop cluster environment, Drill is not tied to +Hadoop and can run in any distributed cluster environment. The only pre- +requisite for Drill is Zookeeper. + +### Query Flow in Drill + +The following image represents the flow of a Drill query: + +![](../img/queryFlow.PNG?version=1&modifica +tionDate=1400017845000&api=v2) + +The flow of a Drill query typically involves the following steps: + + 1. The Drill client issues a query. Any Drillbit in the cluster can accept queries from clients. There is no master-slave concept. + 2. The Drillbit then parses the query, optimizes it, and generates an optimized distributed query plan for fast and efficient execution. + 3. The Drillbit that accepts the query becomes the driving Drillbit node for the request. It gets a list of available Drillbit nodes in the cluster from ZooKeeper. The driving Drillbit determines the appropriate nodes to execute various query plan fragments to maximize data locality. + 4. The Drillbit schedules the execution of query fragments on individual nodes according to the execution plan. + 5. The individual nodes finish their execution and return data to the driving Drillbit. + 6. The driving Drillbit returns results back to the client. + +### Drill Clients + +You can access Drill through the following interfaces: + + * Drill shell (SQLLine) + * Drill Web UI + * ODBC + * JDBC + * C++ API + +Click on either of the following links to continue reading about Drill's +architecture: + + * [Core Modules within a Drillbit](/confluence/display/DRILL/Core+Modules+within+a+Drillbit) + * [Architectural Highlights](/confluence/display/DRILL/Architectural+Highlights) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/002-tutorial.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/002-tutorial.md b/_docs/drill-docs/002-tutorial.md new file mode 100644 index 0000000..597f994 --- /dev/null +++ b/_docs/drill-docs/002-tutorial.md @@ -0,0 +1,58 @@ +--- +title: "Apache Drill Tutorial" +parent: "Apache Drill Documentation" +--- +This tutorial uses the MapR Sandbox, which is a Hadoop environment pre- +configured with Apache Drill. + +To complete the tutorial on the MapR Sandbox with Apache Drill, work through +the following pages in order: + + * [Installing the Apache Drill Sandbox](/confluence/display/DRILL/Installing+the+Apache+Drill+Sandbox) + * [Getting to Know the Drill Setup](/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup) + * [Lesson 1: Learn About the Data Set](/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set) + * [Lesson 2: Run Queries with ANSI SQL](/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL) + * [Lesson 3: Run Queries on Complex Data Types](/confluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types) + * [Summary](/confluence/display/DRILL/Summary) + +# About Apache Drill + +Drill is an Apache open-source SQL query engine for Big Data exploration. +Drill is designed from the ground up to support high-performance analysis on +the semi-structured and rapidly evolving data coming from modern Big Data +applications, while still providing the familiarity and ecosystem of ANSI SQL, +the industry-standard query language. Drill provides plug-and-play integration +with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers +the following key features: + + * Low-latency SQL queries + + * Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore. + + * ANSI SQL + + * Nested data support + + * Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs) + + * BI/SQL tool integration using standard JDBC/ODBC drivers + +# MapR Sandbox with Apache Drill + +MapR includes Apache Drill as part of the Hadoop distribution. The MapR +Sandbox with Apache Drill is a fully functional single-node cluster that can +be used to get an overview on Apache Drill in a Hadoop environment. Business +and technical analysts, product managers, and developers can use the sandbox +environment to get a feel for the power and capabilities of Apache Drill by +performing various types of queries. Once you get a flavor for the technology, +refer to the [Apache Drill web site](http://incubator.apache.org/drill/) and +[Apache Drill documentation +](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki)for more +details. + +Note that Hadoop is not a prerequisite for Drill and users can start ramping +up with Drill by running SQL queries directly on the local file system. Refer +to [Apache Drill in 10 minutes](https://cwiki.apache.org/confluence/display/DR +ILL/Apache+Drill+in+10+Minutes) for an introduction to using Drill in local +(embedded) mode. + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/003-yelp.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/003-yelp.md b/_docs/drill-docs/003-yelp.md new file mode 100644 index 0000000..b9339ed --- /dev/null +++ b/_docs/drill-docs/003-yelp.md @@ -0,0 +1,402 @@ +--- +title: "Analyzing Yelp JSON Data with Apache Drill" +parent: "Apache Drill Documentation" +--- +[Apache Drill](https://www.mapr.com/products/apache-drill) is one of the +fastest growing open source projects, with the community making rapid progress +with monthly releases. The key difference is Drill’s agility and flexibility. +Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low +latency performance at scale, Drill allows users to analyze the data without +any ETL or up-front schema definitions. The data could be in any file format +such as text, JSON, or Parquet. Data could have simple types such as string, +integer, dates, or more complex multi-structured data, such as nested maps and +arrays. Data can exist in any file system, local or distributed, such as HDFS, +[MapR FS](https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and- +snapshots), or S3. Drill, has a “no schema” approach, which enables you to get +value from your data in just a few minutes. + +Let’s quickly walk through the steps required to install Drill and run it +against the Yelp data set. The publicly available data set used for this +example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge) +(business reviews) and is in JSON format. + +## Installing and Starting Drill + +### Step 1: Download Apache Drill onto your local machine + +[http://incubator.apache.org/drill/download/](http://incubator.apache.org/drill/download/) + +You can also [deploy Drill in clustered mode](https://cwiki.apache.org/conflue +nce/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment) if you +want to scale your environment. + +### Step 2 : Open the Drill tar file + +`tar -xvf apache-drill-0.6.0-incubating.tar` + +### Step 3: Launch sqlline, a JDBC application that ships with Drill + +`bin/sqlline -u jdbc:drill:zk=local` + +That’s it! You are now ready explore the data. + +Let’s try out some SQL examples to understand how Drill makes the raw data +analysis extremely easy. + +**Note**: You need to substitute your local path to the Yelp data set in the FROM clause of each query you run. + +## Querying Data with Drill + +### **1\. View the contents of the Yelp business data** + +`0: jdbc:drill:zk=local> !set maxwidth 10000` + +``0: jdbc:drill:zk=local> select * from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +limit 1;`` + + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods | + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd + Ste 101 + Phoenix, AZ 85018 | {"Tuesday":{"close":"17:00","open":"08:00"},"Friday":{"close":"17:00","open":"08:00"},"Monday":{"close":"17:00","open":"08:00"},"Wednesday":{"close":"17:00","open":"08:00"},"Thursday":{"close":"17:00","open":"08:00"},"Sunday":{},"Saturday":{}} | true | ["Doctors","Health & Medical"] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {"By Appointment Only":true,"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | business | [] | + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + +**Note: **You can directly query self-describing files such as JSON, Parquet, and text. There is no need to create metadata definitions in the Hive metastore. + +### **2\. Explore the business data set further** + +#### Total reviews in the data set + +``0: jdbc:drill:zk=local> select sum(review_count) as totalreviews from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +;`` + + +--------------+ + | totalreviews | + +--------------+ + | 1236445 | + +--------------+ + +#### Top states and cities in total number of reviews + +``0: jdbc:drill:zk=local> select state, city, count(*) totalreviews from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +group by state, city order by count(*) desc limit 10;`` + + +------------+------------+--------------+ + | state | city | totalreviews | + +------------+------------+--------------+ + | NV | Las Vegas | 12021 | + | AZ | Phoenix | 7499 | + | AZ | Scottsdale | 3605 | + | EDH | Edinburgh | 2804 | + | AZ | Mesa | 2041 | + | AZ | Tempe | 2025 | + | NV | Henderson | 1914 | + | AZ | Chandler | 1637 | + | WI | Madison | 1630 | + | AZ | Glendale | 1196 | + +------------+------------+--------------+ + +#### **Average number of reviews per business star rating** + +``0: jdbc:drill:zk=local> select stars,trunc(avg(review_count)) reviewsavg from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +group by stars order by stars desc;`` + + +------------+------------+ + | stars | reviewsavg | + +------------+------------+ + | 5.0 | 8.0 | + | 4.5 | 28.0 | + | 4.0 | 48.0 | + | 3.5 | 35.0 | + | 3.0 | 26.0 | + | 2.5 | 16.0 | + | 2.0 | 11.0 | + | 1.5 | 9.0 | + | 1.0 | 4.0 | + +------------+------------+ + +#### **Top businesses with high review counts (> 1000)** + +``0: jdbc:drill:zk=local> select name, state, city, `review_count` from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +where review_count > 1000 order by `review_count` desc limit 10;`` + + +------------+------------+------------+----------------------------+ + | name | state | city | review_count | + +------------+------------+------------+----------------------------+ + | Mon Ami Gabi | NV | Las Vegas | 4084 | + | Earl of Sandwich | NV | Las Vegas | 3655 | + | Wicked Spoon | NV | Las Vegas | 3408 | + | The Buffet | NV | Las Vegas | 2791 | + | Serendipity 3 | NV | Las Vegas | 2682 | + | Bouchon | NV | Las Vegas | 2419 | + | The Buffet at Bellagio | NV | Las Vegas | 2404 | + | Bacchanal Buffet | NV | Las Vegas | 2369 | + | The Cosmopolitan of Las Vegas | NV | Las Vegas | 2253 | + | Aria Hotel & Casino | NV | Las Vegas | 2224 | + +------------+------------+------------+----------------------------+ + +#### **Saturday open and close times for a few businesses** + +``0: jdbc:drill:zk=local> select b.name, b.hours.Saturday.`open`, +b.hours.Saturday.`close` +from +dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` +b limit 10;`` + + +------------+------------+----------------------------+ + | name | EXPR$1 | EXPR$2 | + +------------+------------+----------------------------+ + | Eric Goldberg, MD | 08:00 | 17:00 | + | Pine Cone Restaurant | null | null | + | Deforest Family Restaurant | 06:00 | 22:00 | + | Culver's | 10:30 | 22:00 | + | Chang Jiang Chinese Kitchen| 11:00 | 22:00 | + | Charter Communications | null | null | + | Air Quality Systems | null | null | + | McFarland Public Library | 09:00 | 20:00 | + | Green Lantern Restaurant | 06:00 | 02:00 | + | Spartan Animal Hospital | 07:30 | 18:00 | + +------------+------------+----------------------------+ + +** **Note how Drill can traverse and refer through multiple levels of nesting. + +### **3\. Get the amenities of each business in the data set** + +Note that the attributes column in the Yelp business data set has a different +element for every row, representing that businesses can have separate +amenities. Drill makes it easy to quickly access data sets with changing +schemas. + +First, change Drill to work in all text mode (so we can take a look at all of +the data). + + 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = true; + +------------+-----------------------------------+ + | ok | summary | + +------------+-----------------------------------+ + | true | store.json.all_text_mode updated. | + +------------+-----------------------------------+ + +Then, query the attribute’s data. + + 0: jdbc:drill:zk=local> select attributes from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 10; + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | attributes | + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | {"By Appointment Only":"true","Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"true","dinner":"false","breakfast":"false","brunch":"false"},"Caters":"false","Noise Level":"averag | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"false","breakfast":"false","brunch":"true"},"Caters":"false","Noise Level":"quiet" | + | {"Take-out":"true","Good For":{},"Takes Reservations":"false","Delivery":"false","Ambience":{},"Parking":{"garage":"false","street":"false","validated":"false","lot":"true","val | + | {"Take-out":"true","Good For":{},"Ambience":{},"Parking":{},"Has TV":"false","Outdoor Seating":"false","Attire":"casual","Music":{},"Hair Types Specialized In":{},"Payment Types | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Good For":{},"Ambience":{},"Parking":{},"Wi-Fi":"free","Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"true","breakfast":"false","brunch":"false"},"Noise Level":"average","Takes Reserva | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + +------------+ + +Turn off the all text mode so we can continue to perform arithmetic operations +on data. + + 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = false; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | store.json.all_text_mode updated. | + +**4\. Explore the restaurant businesses in the data set** + +#### **Number of restaurants in the data set**** ** + + 0: jdbc:drill:zk=local> select count(*) as TotalRestaurants from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants'); + +------------------+ + | TotalRestaurants | + +------------------+ + | 14303 | + +------------------+ + +#### **Top restaurants in number of reviews** + + 0: jdbc:drill:zk=local> select name,state,city,`review_count` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by `review_count` desc limit 10 + . . . . . . . . . . . > ; + +------------+------------+------------+--------------+ + | name | state | city | review_count | + +------------+------------+------------+--------------+ + | Mon Ami Gabi | NV | Las Vegas | 4084 | + | Earl of Sandwich | NV | Las Vegas | 3655 | + | Wicked Spoon | NV | Las Vegas | 3408 | + | The Buffet | NV | Las Vegas | 2791 | + | Serendipity 3 | NV | Las Vegas | 2682 | + | Bouchon | NV | Las Vegas | 2419 | + | The Buffet at Bellagio | NV | Las Vegas | 2404 | + | Bacchanal Buffet | NV | Las Vegas | 2369 | + | Hash House A Go Go | NV | Las Vegas | 2201 | + | Mesa Grill | NV | Las Vegas | 2004 | + +------------+------------+------------+--------------+ + +**Top restaurants in number of listed categories** + + 0: jdbc:drill:zk=local> select name,repeated_count(categories) as categorycount, categories from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by repeated_count(categories) desc limit 10; + +------------+---------------+------------+ + | name | categorycount | categories | + +------------+---------------+------------+ + | Binion's Hotel & Casino | 10 | ["Arts & Entertainment","Restaurants","Bars","Casinos","Event Planning & Services","Lounges","Nightlife","Hotels & Travel","American (N | + | Stage Deli | 10 | ["Arts & Entertainment","Food","Hotels","Desserts","Delis","Casinos","Sandwiches","Hotels & Travel","Restaurants","Event Planning & Services"] | + | Jillian's | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Dance Clubs","Nightlife","Bowling","Active Life","Restaurants"] | + | Hotel Chocolat | 9 | ["Coffee & Tea","Food","Cafes","Chocolatiers & Shops","Specialty Food","Event Planning & Services","Hotels & Travel","Hotels","Restaurants"] | + | Hotel du Vin & Bistro Edinburgh | 9 | ["Modern European","Bars","French","Wine Bars","Event Planning & Services","Nightlife","Hotels & Travel","Hotels","Restaurants" | + | Elixir | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Cocktail Bars","Nightlife","American (New)","Local Flavor","Restaurants"] | + | Tocasierra Spa and Fitness | 8 | ["Beauty & Spas","Gyms","Medical Spas","Health & Medical","Fitness & Instruction","Active Life","Day Spas","Restaurants"] | + | Costa Del Sol At Sunset Station | 8 | ["Steakhouses","Mexican","Seafood","Event Planning & Services","Hotels & Travel","Italian","Restaurants","Hotels"] | + | Scottsdale Silverado Golf Club | 8 | ["Fashion","Shopping","Sporting Goods","Active Life","Golf","American (New)","Sports Wear","Restaurants"] | + | House of Blues | 8 | ["Arts & Entertainment","Music Venues","Restaurants","Hotels","Event Planning & Services","Hotels & Travel","American (New)","Nightlife"] | + +------------+---------------+------------+ + +#### **Top first categories in number of review counts** + + 0: jdbc:drill:zk=local> select categories[0], count(categories[0]) as categorycount from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` group by categories[0] + order by count(categories[0]) desc limit 10; + +------------+---------------+ + | EXPR$0 | categorycount | + +------------+---------------+ + | Food | 4294 | + | Shopping | 1885 | + | Active Life | 1676 | + | Bars | 1366 | + | Local Services | 1351 | + | Mexican | 1284 | + | Hotels & Travel | 1283 | + | Fast Food | 963 | + | Arts & Entertainment | 906 | + | Hair Salons | 901 | + +------------+---------------+ + +**5\. Explore the Yelp reviews dataset and combine with the businesses.**** ** + +#### **Take a look at the contents of the Yelp reviews dataset.**** ** + + 0: jdbc:drill:zk=local> select * from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` limit 1; + +------------+------------+------------+------------+------------+------------+------------+-------------+ + | votes | user_id | review_id | stars | date | text | type | business_id | + +------------+------------+------------+------------+------------+------------+------------+-------------+ + | {"funny":0,"useful":2,"cool":1} | Xqd0DzHaiyRqVH3WRG7hzg | 15SdjuK7DmYqUAj6rjGowg | 5 | 2007-05-17 | dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank. | review | vcNAWiLM4dR7D2nwwJ7nCA | + +------------+------------+------------+------------+------------+------------+------------+-------------+ + +#### **Top businesses with cool rated reviews** + +Note that we are combining the Yelp business data set that has the overall +review_count to the Yelp review data, which holds additional details on each +of the reviews themselves. + + 0: jdbc:drill:zk=local> Select b.name from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b where b.business_id in (SELECT r.business_id FROM dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r + GROUP BY r.business_id having sum(r.votes.cool) > 2000 order by sum(r.votes.cool) desc); + +------------+ + | name | + +------------+ + | Earl of Sandwich | + | XS Nightclub | + | The Cosmopolitan of Las Vegas | + | Wicked Spoon | + +------------+ + +**Create a view with the combined business and reviews data sets** + +Note that Drill views are lightweight, and can just be created in the local +file system. Drill in standalone mode comes with a dfs.tmp workspace, which we +can use to create views (or you can can define your own workspaces on a local +or distributed file system). If you want to persist the data physically +instead of in a logical view, you can use CREATE TABLE AS SELECT syntax. + + 0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, r.`date` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b , dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r where r.business_id=b.business_id + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | View 'businessreviews' created successfully in 'dfs.tmp' schema | + +------------+------------+ + +Let’s get the total number of records from the view. + + 0: jdbc:drill:zk=local> select count(*) as Total from dfs.tmp.businessreviews; + +------------+ + | Total | + +------------+ + | 1125458 | + +------------+ + +In addition to these queries, you can get many more deeper insights using +Drill’s [SQL functionality](https://cwiki.apache.org/confluence/display/DRILL/ +SQL+Reference). If you are not comfortable with writing queries manually, you +can use a BI/Analytics tools such as Tableau/MicroStrategy to query raw +files/Hive/HBase data or Drill-created views directly using Drill ODBC/JDBC +drivers. + +The goal of Apache Drill is to provide the freedom and flexibility in +exploring data in ways we have never seen before with SQL technologies. The +community is working on more exciting features around nested data and +supporting data with changing schemas in upcoming releases. + +As an example, a new FLATTEN function is in development (an upcoming feature +in 0.7). This function can be used to dynamically rationalize semi-structured +data so you can apply even deeper SQL functionality. Here is a sample query: + +#### **Get a flattened list of categories for each business** + + 0: jdbc:drill:zk=local> select name, flatten(categories) as category from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 20; + +------------+------------+ + | name | category | + +------------+------------+ + | Eric Goldberg, MD | Doctors | + | Eric Goldberg, MD | Health & Medical | + | Pine Cone Restaurant | Restaurants | + | Deforest Family Restaurant | American (Traditional) | + | Deforest Family Restaurant | Restaurants | + | Culver's | Food | + | Culver's | Ice Cream & Frozen Yogurt | + | Culver's | Fast Food | + | Culver's | Restaurants | + | Chang Jiang Chinese Kitchen | Chinese | + | Chang Jiang Chinese Kitchen | Restaurants | + | Charter Communications | Television Stations | + | Charter Communications | Mass Media | + | Air Quality Systems | Home Services | + | Air Quality Systems | Heating & Air Conditioning/HVAC | + | McFarland Public Library | Libraries | + | McFarland Public Library | Public Services & Government | + | Green Lantern Restaurant | American (Traditional) | + | Green Lantern Restaurant | Restaurants | + | Spartan Animal Hospital | Veterinarians | + +------------+------------+ + +**Top categories used in business reviews** + + 0: jdbc:drill:zk=local> select celltbl.catl, count(celltbl.catl) categorycnt from (select flatten(categories) catl from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` ) celltbl group by celltbl.catl order by count(celltbl.catl) desc limit 10 ; + +------------+-------------+ + | catl | categorycnt | + +------------+-------------+ + | Restaurants | 14303 | + | Shopping | 6428 | + | Food | 5209 | + | Beauty & Spas | 3421 | + | Nightlife | 2870 | + | Bars | 2378 | + | Health & Medical | 2351 | + | Automotive | 2241 | + | Home Services | 1957 | + | Fashion | 1897 | + +------------+-------------+ + +Stay tuned for more features and upcoming activities in the Drill community. + +To learn more about Drill, please refer to the following resources: + + * Download Drill here: + * 10 reasons we think Drill is cool: + * A simple 10-minute tutorial: + * A more comprehensive tutorial: + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/004-install.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/004-install.md b/_docs/drill-docs/004-install.md new file mode 100644 index 0000000..fe7578c --- /dev/null +++ b/_docs/drill-docs/004-install.md @@ -0,0 +1,20 @@ +--- +title: "Install Drill" +parent: "Apache Drill Documentation" +--- +You can install Drill in embedded mode or in distributed mode. Installing +Drill in embedded mode does not require any configuration, which means that +you can quickly get started with Drill. If you want to use Drill in a +clustered Hadoop environment, you can install Drill in distributed mode. +Installing in distributed mode requires some configuration, however once you +install you can connect Drill to your Hive, HBase, or distributed file system +data sources and run queries on them. + +Click on any of the following links for more information about how to install +Drill in embedded or distributed mode: + + * [Apache Drill in 10 Minutes](/confluence/display/DRILL/Apache+Drill+in+10+Minutes) + * [Deploying Apache Drill in a Clustered Environment](/confluence/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment) + * [Installing Drill in Embedded Mode](/confluence/display/DRILL/Installing+Drill+in+Embedded+Mode) + * [Installing Drill in Distributed Mode](/confluence/display/DRILL/Installing+Drill+in+Distributed+Mode) + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/005-connect.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/005-connect.md b/_docs/drill-docs/005-connect.md new file mode 100644 index 0000000..039fc78 --- /dev/null +++ b/_docs/drill-docs/005-connect.md @@ -0,0 +1,49 @@ +--- +title: "Connect to Data Sources" +parent: "Apache Drill Documentation" +--- +Apache Drill serves as a query layer that connects to data sources through +storage plugins. Drill uses the storage plugins to interact with data sources. +You can think of a storage plugin as a connection between Drill and a data +source. + +The following image represents the storage plugin layer between Drill and a +data source: + +![](../img/storageplugin.png) + +Storage plugins provide the following information to Drill: + + * Metadata available in the underlying data source + * Location of data + * Interfaces that Drill can use to read from and write to data sources + * A set of storage plugin optimization rules that assist with efficient and faster execution of Drill queries, such as pushdowns, statistics, and partition awareness + +Storage plugins perform scanner and writer functions, and inform the metadata +repository of any known metadata, such as: + + * Schema + * File size + * Data ordering + * Secondary indices + * Number of blocks + +Storage plugins inform the execution engine of any native capabilities, such +as predicate pushdown, joins, and SQL. + +Drill provides storage plugins for files and HBase/M7. Drill also integrates +with Hive through a storage plugin. Hive provides a metadata abstraction layer +on top of files and HBase/M7. + +When you run Drill to query files in HBase/M7, Drill can perform direct +queries on the data or go through Hive, if you have metadata defined there. +Drill integrates with the Hive metastore for metadata and also uses a Hive +SerDe for the deserialization of records. Drill does not invoke the Hive +execution engine for any requests. + +For information about how to connect Drill to your data sources, refer to +storage plugin registration: + + * [Storage Plugin Registration](/confluence/display/DRILL/Storage+Plugin+Registration) + * [MongoDB Plugin for Apache Drill](/confluence/display/DRILL/MongoDB+Plugin+for+Apache+Drill) + * [MapR-DB Plugin for Apache Drill](/confluence/display/DRILL/MapR-DB+Plugin+for+Apache+Drill) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/006-query.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/006-query.md b/_docs/drill-docs/006-query.md new file mode 100644 index 0000000..4b4fda0 --- /dev/null +++ b/_docs/drill-docs/006-query.md @@ -0,0 +1,57 @@ +--- +title: "Query Data" +parent: "Apache Drill Documentation" +--- +You can query local and distributed file systems, Hive, and HBase data sources +registered with Drill. If you connected directly to a particular schema when +you invoked SQLLine, you can issue SQL queries against that schema. If you did +not indicate a schema when you invoked SQLLine, you can issue the `USE +` statement to run your queries against a particular schema. After you +issue the `USE` statement, you can use absolute notation, such as +`schema.table.column`. + +Click on any of the following links for information about various data source +queries and examples: + + * [Querying a File System](/confluence/display/DRILL/Querying+a+File+System) + * [Querying HBase](/confluence/display/DRILL/Querying+HBase) + * [Querying Hive](/confluence/display/DRILL/Querying+Hive) + * [Querying Complex Data](/confluence/display/DRILL/Querying+Complex+Data) + * [Querying the INFORMATION_SCHEMA](/confluence/display/DRILL/Querying+the+INFORMATION_SCHEMA) + * [Querying System Tables](/confluence/display/DRILL/Querying+System+Tables) + * [Drill Interfaces](/confluence/display/DRILL/Drill+Interfaces) + +You may need to use casting functions in some queries. For example, you may +have to cast a string `"100"` to an integer in order to apply a math function +or an aggregate function. + +You can use the EXPLAIN command to analyze errors and troubleshoot queries +that do not run. For example, if you run into a casting error, the query plan +text may help you isolate the problem. + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + 0: jdbc:drill:zk=local> explain plan for select ... ; + +The set command increases the default text display (number of characters). By +default, most of the plan output is hidden. + +You may see errors if you try to use non-standard or unsupported SQL syntax in +a query. + +Remember the following tips when querying data with Drill: + + * Include a semicolon at the end of SQL statements, except when you issue a command with an exclamation point `(!). +`Example: `!set maxwidth 10000` + + * Use backticks around file and directory names that contain special characters and also around reserved words when you query a file system . +The following special characters require backticks: + + * . (period) + * / (forward slash) + * _ (underscore) + +Example: ``SELECT * FROM dfs.default.`sample_data/my_sample.json`; `` + + * `CAST` data to `VARCHAR` if an expression in a query returns `VARBINARY` as the result type in order to view the `VARBINARY` types as readable data. If you do not use the `CAST` function, Drill returns the results as byte data. +Example: `CAST (VARBINARY_expr as VARCHAR(50))` + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/006-sql-ref.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/006-sql-ref.md b/_docs/drill-docs/006-sql-ref.md new file mode 100644 index 0000000..8818ca3 --- /dev/null +++ b/_docs/drill-docs/006-sql-ref.md @@ -0,0 +1,25 @@ +--- +title: "Develop Custom Functions" +parent: "Apache Drill Documentation" +--- +Drill supports the ANSI standard for SQL. You can use SQL to query your Hive, +HBase, and distributed file system data sources. Drill can discover the form +of the data when you submit a query. You can query text files and nested data +formats, such as JSON and Parquet. Drill provides special operators and +functions that you can use to _drill down _into nested data formats. + +Drill queries do not require information about the data that you are trying to +access, regardless of its source system or its schema and data types. The +sweet spot for Apache Drill is a SQL query workload against "complex data": +data made up of various types of records and fields, rather than data in a +recognizable relational form (discrete rows and columns). + +Refer to the following SQL reference pages for more information: + + * [Data Types](/confluence/display/DRILL/Data+Types) + * [Operators](/confluence/display/DRILL/Operators) + * [SQL Functions](/confluence/display/DRILL/SQL+Functions) + * [Nested Data Functions](/confluence/display/DRILL/Nested+Data+Functions) + * [SQL Commands Summary](/confluence/display/DRILL/SQL+Commands+Summary) + * [Reserved Keywords](/confluence/display/DRILL/Reserved+Keywords) + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/007-dev-custom-func.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/007-dev-custom-func.md b/_docs/drill-docs/007-dev-custom-func.md new file mode 100644 index 0000000..9bc8e65 --- /dev/null +++ b/_docs/drill-docs/007-dev-custom-func.md @@ -0,0 +1,47 @@ +--- +title: "Develop Custom Functions" +parent: "Apache Drill Documentation" +--- + +Drill provides a high performance Java API with interfaces that you can +implement to develop simple and aggregate custom functions. Custom functions +are reusable SQL functions that you develop in Java to encapsulate code that +processes column values during a query. Custom functions can perform +calculations and transformations that built-in SQL operators and functions do +not provide. Custom functions are called from within a SQL statement, like a +regular function, and return a single value. + +### Simple Function + +A simple function operates on a single row and produces a single row as the +output. When you include a simple function in a query, the function is called +once for each row in the result set. Mathematical and string functions are +examples of simple functions. + +### Aggregate Function + +Aggregate functions differ from simple functions in the number of rows that +they accept as input. An aggregate function operates on multiple input rows +and produces a single row as output. The COUNT(), MAX(), SUM(), and AVG() +functions are examples of aggregate functions. You can use an aggregate +function in a query with a GROUP BY clause to produce a result set with a +separate aggregate value for each combination of values from the GROUP BY +clause. + +### Process + +To develop custom functions that you can use in your Drill queries, you must +complete the following tasks: + + 1. Create a Java program that implements Drill’s simple or aggregate interface, and compile a sources and a classes JAR file. + 2. Add the sources and classes JAR files to Drill’s classpath. + 3. Add the name of the package that contains the classes to Drill’s main configuration file, drill-override.conf. + +Click on one of the following links to learn how to create custom functions +for Drill: + + * [Developing a Simple Function](/confluence/display/DRILL/Developing+a+Simple+Function) + * [Developing an Aggregate Function](/confluence/display/DRILL/Developing+an+Aggregate+Function) + * [Adding Custom Functions to Drill](/confluence/display/DRILL/Adding+Custom+Functions+to+Drill) + * [Using Custom Functions in Queries](/confluence/display/DRILL/Using+Custom+Functions+in+Queries) + * [Custom Function Interfaces](/confluence/display/DRILL/Custom+Function+Interfaces) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/008-manage.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/008-manage.md b/_docs/drill-docs/008-manage.md new file mode 100644 index 0000000..e629b20 --- /dev/null +++ b/_docs/drill-docs/008-manage.md @@ -0,0 +1,23 @@ +--- +title: "Manage Drill" +parent: "Apache Drill Documentation" +--- +When using Drill, you may need to stop and restart a Drillbit on a node, or +modify various options. For example, the default storage format for CTAS +statements is Parquet. You can modify the default setting so that output data +is stored in CSV or JSON format. + +You can use certain SQL commands to manage Drill from within the Drill shell +(SQLLine). You can also modify Drill configuration options, such as memory +allocation, in Drill's configuration files. + +Refer to the following documentation for information about managing Drill in +your cluster: + + * [Configuration Options](/confluence/display/DRILL/Configuration+Options) + * [Starting/Stopping Drill](/confluence/pages/viewpage.action?pageId=44994063) + * [Ports Used by Drill](/confluence/display/DRILL/Ports+Used+by+Drill) + * [Partition Pruning](/confluence/display/DRILL/Partition+Pruning) + * [Monitoring and Canceling Queries in the Drill Web UI](/confluence/display/DRILL/Monitoring+and+Canceling+Queries+in+the+Drill+Web+UI) + + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/009-develop.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/009-develop.md b/_docs/drill-docs/009-develop.md new file mode 100644 index 0000000..d95f986 --- /dev/null +++ b/_docs/drill-docs/009-develop.md @@ -0,0 +1,16 @@ +--- +title: "Develop Drill" +parent: "Apache Drill Documentation" +--- +To develop Drill, you compile Drill from source code and then set up a project +in Eclipse for use as your development environment. To review or contribute to +Drill code, you must complete the steps required to install and use the Drill +patch review tool. + +For information about contributing to the Apache Drill project, you can refer +to the following pages: + + * [Compiling Drill from Source](/confluence/display/DRILL/Compiling+Drill+from+Source) + * [Setting Up Your Development Environment](/confluence/display/DRILL/Setting+Up+Your+Development+Environment) + * [Drill Patch Review Tool](/confluence/display/DRILL/Drill+Patch+Review+Tool) + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/010-rn.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/010-rn.md b/_docs/drill-docs/010-rn.md new file mode 100644 index 0000000..f196714 --- /dev/null +++ b/_docs/drill-docs/010-rn.md @@ -0,0 +1,192 @@ +--- +title: "Release Notes" +parent: "Apache Drill Documentation" +--- +## Apache Drill 0.7.0 Release Notes + +Apache Drill 0.7.0, the third beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as +[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop. +Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop +distributions. There are associated build profiles and JIRAs that can help you +run Drill against your preferred distribution + +Apache Drill 0.7.0 Key Features + + * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios: + + * UDP multicast not enabled (as in EC2) + + * Cluster spans multiple subnets + + * Cluster has multihome configuration + + * New functions to natively work with nested data - KVGen and Flatten + + * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) + + * Improved performance when querying Hive tables and File system through partition pruning + + * Improved performance for HBase with LIKE operator pushdown + + * Improved memory management + + * Drill web UI monitoring and query profile improvements + + * Ability to parse files without explicit extensions using default storage format specification + + * Fixes for dealing with complex/nested data objects in Parquet/JSON + + * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly + + * Several hang related fixes + + * Parquet writer fixes for handling large datasets + + * Stability improvements in ODBC and JDBC drivers + +Apache Drill 0.7.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + +## Apache Drill 0.6.0 Release Notes + +Apache Drill 0.6.0, the second beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +Apache Drill 0.6.0 Key Features + +This release is primarily a bug fix release, with [more than 30 JIRAs closed]( +https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers +ion=12327472), but there are some notable features: + + * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/confluence/display/DRILL/MongoDB+Plugin+for+Apache+Drill) + * Filesystem query performance improvements with partition pruning + * Ability to use the file system as a persistent store for query profiles and diagnostic information + * Window function support (alpha) + +Apache Drill 0.6.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + +## Apache Drill 0.5.0 Release Notes + +Apache Drill 0.5.0, the first beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h +ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi +on=12324880) closed, but there are some notable features. For information +about the features, see the [Apache Drill Blog for the 0.5.0 +release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see). + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +Apache Drill 0.5.0 Key Notes and Limitations + + * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results. + * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information. + +## Apache Drill 0.4.0 Release Notes + +The 0.4.0 release is a developer preview release, designed to help enthusiasts +start to work with and experiment with Drill. It is the first Drill release +that provides distributed query execution. + +This release is built upon [more than 800 +JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/). +It is a pre-beta release on the way towards Drill. As a developer snapshot, +the release contains a large number of outstanding bugs that will make some +use cases challenging. Feel free to consult outstanding issues [targeted for +the 0.5.0 +release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/) +to see whether your use case is affected. + +To read more about this release and new features introduced, please view the +[0.4.0 announcement blog +entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4). + +The release is available as both [binary](http://www.apache.org/dyn/closer.cgi +/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz) +and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0- +incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases, +these are compiled against Apache Hadoop. Drill has also been tested against +MapR, Cloudera and Hortonworks Hadoop distributions and there are associated +build profiles or JIRAs that can help you run against your preferred +distribution. + +Some Key Notes & Limitations + + * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * In many cases,merge join operations return incorrect results. + * Use of a local filter in a join “on” clause when using left, right or full outer joins may result in incorrect results. + * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases. + * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results. + * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways. + * Multiplication of some types of DECIMAL(28+,*) will return incorrect result. + +## Apache Drill M1 -- Release Notes (Apache Drill Alpha) + +### Milestone 1 Goals + +The first release of Apache Drill is designed as a technology preview for +people to better understand the architecture and vision. It is a functional +release tying to piece together the key components of a next generation MPP +query engine. It is designed to allow milestone 2 (M2) to focus on +architectural analysis and performance optimization. + + * Provide a new optimistic DAG execution engine for data analysis + * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity + * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance + * Support queries against columnar on disk format (Parquet) and JSON + * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN + * Support schema-on-read querying and execution + * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal. + * Support unlimited level of subqueries and correlated subqueries + * Provided an extensible query-language agnostic JSON-base logical data flow syntax. + * Support complex data type manipulation via logical plan operations + +### Known Issues + +SQL Parsing +Because Apache Drill is built to support late-bound changing schemas while SQL +is statically typed, there are couple of special requirements that are +required writing SQL queries. These are limited to the current release and +will be correct in a future milestone release. + + * All tables are exposed as a single map field that contains + * Drill Alpha doesn't support implicit or explicit casts outside those required above. + * Drill Alpha does not include, there are currently a couple of differences for how to write a query in In order to query against + +UDFs + + * Drill currently supports simple and aggregate functions using scalar, repeated and + * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. However, + * asd + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/011-contribute.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/011-contribute.md b/_docs/drill-docs/011-contribute.md new file mode 100644 index 0000000..282ab8a --- /dev/null +++ b/_docs/drill-docs/011-contribute.md @@ -0,0 +1,11 @@ +--- +title: "Contribute to Drill" +parent: "Apache Drill Documentation" +--- +The Apache Drill community welcomes your support. Please read [Apache Drill +Contribution Guidelines](https://cwiki.apache.org/confluence/display/DRILL/Apa +che+Drill+Contribution+Guidelines) for information about how to contribute to +the project. If you would like to contribute to the project and need some +ideas for what to do, please read [Apache Drill Contribution +Ideas](/confluence/display/DRILL/Apache+Drill+Contribution+Ideas). + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/012-sample-ds.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/012-sample-ds.md b/_docs/drill-docs/012-sample-ds.md new file mode 100644 index 0000000..fe63f6b --- /dev/null +++ b/_docs/drill-docs/012-sample-ds.md @@ -0,0 +1,11 @@ +--- +title: "Sample Datasets" +parent: "Apache Drill Documentation" +--- +Use any of the following sample datasets provided to test Drill: + + * [AOL Search](/confluence/display/DRILL/AOL+Search) + * [Enron Emails](/confluence/display/DRILL/Enron+Emails) + * [Wikipedia Edit History](/confluence/display/DRILL/Wikipedia+Edit+History) + + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/013-design.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/013-design.md b/_docs/drill-docs/013-design.md new file mode 100644 index 0000000..57d73c1 --- /dev/null +++ b/_docs/drill-docs/013-design.md @@ -0,0 +1,14 @@ +--- +title: "Design Docs" +parent: "Apache Drill Documentation" +--- +Review the Apache Drill design docs for early descriptions of Apache Drill +functionality, terms, and goals, and reference the research articles to learn +about Apache Drill's history: + + * [Drill Plan Syntax](/confluence/display/DRILL/Drill+Plan+Syntax) + * [RPC Overview](/confluence/display/DRILL/RPC+Overview) + * [Query Stages](/confluence/display/DRILL/Query+Stages) + * [Useful Research](/confluence/display/DRILL/Useful+Research) + * [Value Vectors](/confluence/display/DRILL/Value+Vectors) + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/014-progress.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/014-progress.md b/_docs/drill-docs/014-progress.md new file mode 100644 index 0000000..2a1538c --- /dev/null +++ b/_docs/drill-docs/014-progress.md @@ -0,0 +1,9 @@ +--- +title: "Progress Reports" +parent: "Apache Drill Documentation" +--- +Review the following Apache Drill progress reports for a summary of issues, +progression of the project, summary of mailing list discussions, and events: + + * [2014 Q1 Drill Report](/confluence/display/DRILL/2014+Q1+Drill+Report) + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/015-archived-pages.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/015-archived-pages.md b/_docs/drill-docs/015-archived-pages.md new file mode 100644 index 0000000..b2a29c3 --- /dev/null +++ b/_docs/drill-docs/015-archived-pages.md @@ -0,0 +1,9 @@ +--- +title: "Archived Pages" +parent: "Apache Drill Documentation" +--- +The following pages have been archived: + +* How to Run Drill with Sample Data +* Meet Apache Drill + http://git-wip-us.apache.org/repos/asf/drill/blob/84b7b36d/_docs/drill-docs/016-bylaws.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/016-bylaws.md b/_docs/drill-docs/016-bylaws.md new file mode 100644 index 0000000..6f2604f --- /dev/null +++ b/_docs/drill-docs/016-bylaws.md @@ -0,0 +1,171 @@ +--- +title: "Project Bylaws" +parent: "Apache Drill Documentation" +--- +# Introduction + +This document defines the bylaws under which the Apache Drill project +operates. It defines the roles and responsibilities of the project, who may +vote, how voting works, how conflicts are resolved, etc. + +Drill is a project of the [Apache Software +Foundation](http://www.apache.org/foundation/). The foundation holds the +copyright on Apache code including the code in the Drill codebase. The +[foundation FAQ](http://www.apache.org/foundation/faq.html) explains the +operation and background of the foundation. + +Drill is typical of Apache projects in that it operates under a set of +principles, known collectively as the _Apache Way_. If you are new to Apache +development, please refer to the [Incubator +project](http://incubator.apache.org/) for more information on how Apache +projects operate. + +# Roles and Responsibilities + +Apache projects define a set of roles with associated rights and +responsibilities. These roles govern what tasks an individual may perform +within the project. The roles are defined in the following sections. + +## Users + +The most important participants in the project are people who use our +software. The majority of our contributors start out as users and guide their +development efforts from the user's perspective. + +Users contribute to the Apache projects by providing feedback to contributors +in the form of bug reports and feature suggestions. As well, users participate +in the Apache community by helping other users on mailing lists and user +support forums. + +## Contributors + +All of the volunteers who are contributing time, code, documentation, or +resources to the Drill Project. A contributor that makes sustained, welcome +contributions to the project may be invited to become a committer, though the +exact timing of such invitations depends on many factors. + +## Committers + +The project's committers are responsible for the project's technical +management. Committers have access to a specified set of subproject's code +repositories. Committers on subprojects may cast binding votes on any +technical discussion regarding that subproject. + +Committer access is by invitation only and must be approved by lazy consensus +of the active PMC members. A Committer is considered _emeritus_ by his or her +own declaration or by not contributing in any form to the project for over six +months. An emeritus committer may request reinstatement of commit access from +the PMC which will be sufficient to restore him or her to active committer +status. + +Commit access can be revoked by a unanimous vote of all the active PMC members +(except the committer in question if he or she is also a PMC member). + +All Apache committers are required to have a signed [Contributor License +Agreement (CLA)](http://www.apache.org/licenses/icla.txt) on file with the +Apache Software Foundation. There is a [Committer +FAQ](http://www.apache.org/dev/committers.html) which provides more details on +the requirements for committers. + +A committer who makes a sustained contribution to the project may be invited +to become a member of the PMC. The form of contribution is not limited to +code. It can also include code review, helping out users on the mailing lists, +documentation, etc. + +## Project Management Committee + +The PMC is responsible to the board and the ASF for the management and +oversight of the Apache Drill codebase. The responsibilities of the PMC +include + + * Deciding what is distributed as products of the Apache Drill project. In particular all releases must be approved by the PMC. + * Maintaining the project's shared resources, including the codebase repository, mailing lists, websites. + * Speaking on behalf of the project. + * Resolving license disputes regarding products of the project. + * Nominating new PMC members and committers. + * Maintaining these bylaws and other guidelines of the project. + +Membership of the PMC is by invitation only and must be approved by a lazy +consensus of active PMC members. A PMC member is considered _emeritus_ by his +or her own declaration or by not contributing in any form to the project for +over six months. An emeritus member may request reinstatement to the PMC, +which will be sufficient to restore him or her to active PMC member. + +Membership of the PMC can be revoked by an unanimous vote of all the active +PMC members other than the member in question. + +The chair of the PMC is appointed by the ASF board. The chair is an office +holder of the Apache Software Foundation (Vice President, Apache Drill) and +has primary responsibility to the board for the management of the projects +within the scope of the Drill PMC. The chair reports to the board quarterly on +developments within the Drill project. + +The term of the chair is one year. When the current chair's term is up or if +the chair resigns before the end of his or her term, the PMC votes to +recommend a new chair using lazy consensus, but the decision must be ratified +by the Apache board. + +# Decision Making + +Within the Drill project, different types of decisions require different forms +of approval. For example, the previous section describes several decisions +which require 'lazy consensus' approval. This section defines how voting is +performed, the types of approvals, and which types of decision require which +type of approval. + +## Voting + +Decisions regarding the project are made by votes on the primary project +development mailing list +_[dev@drill.apache.org](mailto:dev@drill.apache.org)_. Where necessary, PMC +voting may take place on the private Drill PMC mailing list +[private@drill.apache.org](mailto:private@drill.apache.org). Votes are clearly +indicated by subject line starting with [VOTE]. Votes may contain multiple +items for approval and these should be clearly separated. Voting is carried +out by replying to the vote mail. Voting may take four flavors. + +

Vote

+1

'Yes,' 'Agree,' or 'the action should be performed.' In general, this vote also indicates a willingness on the behalf of the voter in 'making it happen'.

+0

This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.

-0

This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.

-1

This is a negative vote. On issues where consensus is required, this vote counts as a veto. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an alternative course of action.

+ +All participants in the Drill project are encouraged to show their agreement +with or against a particular action by voting. For technical decisions, only +the votes of active committers are binding. Non binding votes are still useful +for those with binding votes to understand the perception of an action in the +wider Drill community. For PMC decisions, only the votes of PMC members are +binding. + +Voting can also be applied to changes already made to the Drill codebase. +These typically take the form of a veto (-1) in reply to the commit message +sent when the commit is made. Note that this should be a rare occurrence. All +efforts should be made to discuss issues when they are still patches before +the code is committed. + +## Approvals + +These are the types of approvals that can be sought. Different actions require +different types of approvals. + +

Approval Type

Consensus

For this to pass, all voters with binding votes must vote and there can be no binding vetoes (-1). Consensus votes are rarely required due to the impracticality of getting all eligible voters to cast a vote.

Lazy Consensus

Lazy consensus requires 3 binding +1 votes and no binding vetoes.

Lazy Majority

A lazy majority vote requires 3 binding +1 votes and more binding +1 votes that -1 votes.

Lazy Approval

An a ction with lazy approval is implicitly allowed unless a -1 vote is received, at which time, depending on the type of action, either lazy majority or lazy consensus approval must be obtained.

+ +## Vetoes + +A valid, binding veto cannot be overruled. If a veto is cast, it must be +accompanied by a valid reason explaining the reasons for the veto. The +validity of a veto, if challenged, can be confirmed by anyone who has a +binding vote. This does not necessarily signify agreement with the veto - +merely that the veto is valid. + +If you disagree with a valid veto, you must lobby the person casting the veto +to withdraw his or her veto. If a veto is not withdrawn, the action that has +been vetoed must be reversed in a timely manner. + +## Actions + +This section describes the various actions which are undertaken within the +project, the corresponding approval required for that action and those who +have binding votes over the action. It also specifies the minimum length of +time that a vote must remain open, measured in business days. In general votes +should not be called at times when it is known that interested members of the +project will be unavailable. + +

Action

Description

Approval

Binding Votes

Minimum Length

Code Change

A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc.

Consensus approval of active committers, with a minimum of one +1. The code can be committed after the first +1

Active committers

1

Release Plan

Define s the timetable and actions for a release. The plan also nominates a Release Manager.

Lazy majority

Active committers

3

Product Release

When a release of one of the project's products is ready, a vote is required to accept the release as an official release of the project.

Lazy Majority

Active PMC members

3

Adoption of New Codebase

When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing cod e base will continue. This also covers the creation of new sub-projects within the project.

2/3 majority

Active PMC members

6

New Committer

When a new committer is proposed for the project.

Lazy consensus

Active PMC members

3

New PMC Member

When a committer is proposed for the PMC.

Lazy consensus

Active PMC members

3

Committer Removal

When removal of commit privileges is sought. Note: Such actions will also be referred to the ASF board by the PMC chair.

Consensus

Active PMC members (excluding the committer in question if a member of the PMC).

6

PMC Member Removal

When removal of a PMC member is sought. Note: Such actions will also be referred to the ASF board by the PMC chair.

Consensus

Active PMC members (excluding the member in question).

6

Mo difying Bylaws

Modifying this document.

2/3 majority

Active PMC members

6

+