Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B1BD18522 for ; Tue, 15 Sep 2015 23:24:18 +0000 (UTC) Received: (qmail 20957 invoked by uid 500); 15 Sep 2015 23:24:18 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 20913 invoked by uid 500); 15 Sep 2015 23:24:18 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 20806 invoked by uid 99); 15 Sep 2015 23:24:18 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2015 23:24:18 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 25EFCE01FC; Tue, 15 Sep 2015 23:24:18 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: bridgetb@apache.org To: commits@drill.apache.org Date: Tue, 15 Sep 2015 23:24:20 -0000 Message-Id: <985dd9187d004e68b0edb062a3d216a5@git.apache.org> In-Reply-To: <203d2e50dffc4e71a0e33f28fcd3016a@git.apache.org> References: <203d2e50dffc4e71a0e33f28fcd3016a@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [3/5] drill git commit: Daniel's changes http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-commands/055-drop-table.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/055-drop-table.md b/_docs/sql-reference/sql-commands/055-drop-table.md new file mode 100755 index 0000000..aa32733 --- /dev/null +++ b/_docs/sql-reference/sql-commands/055-drop-table.md @@ -0,0 +1,270 @@ +--- +title: "DROP TABLE" +parent: "SQL Commands" +--- + +As of Drill 1.2, you can use the DROP TABLE command to remove tables (files or directories) from a file system when the file system is configured as a DFS storage plugin. See [Storage Plugin Registration]({{ site.baseurl }}/docs/storage-plugin-registration/). Currently, you can only issue the DROP TABLE command against file system data sources. + +## Syntax +The DROP TABLE command supports the following syntax: + + DROP TABLE [workspace.]name; + +*name* is a unique directory or file name, optionally prefaced by a storage plugin name, such as dfs, and a workspace, such as tmp using dot notation. + + +## Usage Notes + +###Schema +* You must identify the schema in which a table exists to successfully drop the table. You can identify the schema before dropping the table with the USE command (see [USE command]({{ site.baseurl }}/docs/use/)) or when you issue the DROP TABLE command. See [Example 1: Identifying a schema]({{ site.baseurl }}/docs/drop-table/#example-1:-identifying-a-schema). +* The schema must be mutable. For example, to drop a table from a schema named `dfs.sales`, the "`writable`" attribute for the sales workspace in the DFS storage plugin configuration must be set to `true`. See [Storage Plugin Attributes]({{ site.baseurl }}/docs/plugin-configuration-basics/#storage-plugin-attributes). + +###File Type +* The DROP TABLE command only works against file types that Drill can read. File types are identified as supported file formats, such as Parquet, JSON, or text. See [Querying a File System Introduction]({{ site.baseurl }}/docs/querying-a-file-system-introduction/) for a complete list of supported types. +* Text formats must be configured in the DFS storage plugin configuration. For example, to support CSV files, the “`format`” attribute in the configuration must include CSV as a value. See [Storage Plugin Attributes]({{ site.baseurl }}/docs/plugin-configuration-basics/#storage-plugin-attributes). +* The directory on which you issue the DROP TABLE command must contain files of the same type. For example, if you have a workspace configured, such as `dfs.sales`, that points to a directory containing subdirectories, such as `/2012` and `/2013`, files in all of the directories must be of the same type in order to successfully issue the DROP TABLE command against the directory. + +###Permissions +* A user must have the appropriate permissions on the file system to successfully issue the DROP TABLE command. Inadequate permissions result in a failed drop and can potentially remove a subset of the files in a directory. + +###User Impersonation +* When user impersonation is enabled in Drill, Drill impersonates the user issuing the DROP TABLE command. Therefore, the user must have sufficient permissions on the file system for the command to succeed. See [Configuring User Impersonation]({{ site.baseurl }}/docs/configuring-user-impersonation/). +* When user impersonation is not enabled in Drill, Drill accesses the file system as the user running the Drillbit. This user is typically a super user who has permission to delete most files. In this scenario, use the DROP TABLE command with caution to avoid deleting critical files and directories. + +###Views +* Views are independent of tables. Views that reference dropped tables become invalid. You must explicitly drop any view that references a dropped table using the [DROP VIEW command]({{ site.baseurl }}/docs/drop-view/). + +###Concurrency +* Concurrency occurs when two processes try to access and/or change data at the same time. Currently, Drill does not have a mechanism in place, such as read locks on files, to address concurrency issues. For example, if one user runs a query that references a table that another user simultaneously issues the DROP TABLE command against, there is no mechanism in place to prevent a collision of the two processes. In such a scenario, Drill may return partial query results or a system error to the user running the query when the table is dropped. + + +## Examples + +The following examples show results for several DROP TABLE scenarios. + +###Example 1: Identifying a schema +This example shows you how to identify a schema with the USE and DROP TABLE commands to successfully drop a table named `donuts_json` in the “`donuts`” workspace configured within the DFS storage plugin configuration. + +The "`donuts`" workspace is configured within the following DFS configuration: + + { + "type": "file", + "enabled": true, + "connection": "file:///", + "workspaces": { + "root": { + "location": "/", + "writable": false, + "defaultInputFormat": null + }, + "donuts": { + "location": "/Users/user1/donuts", + "writable": true, + "defaultInputFormat": null + } + }, + +Issuing the `USE dfs.donuts` command changes to the `dfs.donuts` schema before issuing the `DROP TABLE` command. + + 0: jdbc:drill:zk=local> use dfs.donuts; + +-------+-----------------------------------------+ + | ok | summary | + +-------+-----------------------------------------+ + | true | Default schema changed to [dfs.donuts] | + +-------+-----------------------------------------+ + 1 row selected (0.096 seconds) + + 0: jdbc:drill:zk=local> drop table donuts_json; + +-------+------------------------------+ + | ok | summary | + +-------+------------------------------+ + | true | Table [donuts_json] dropped | + +-------+------------------------------+ + 1 row selected (0.094 seconds) + +Alternatively, instead of issuing the `USE` command to change the schema, you can include the schema name when you drop the table. + + 0: jdbc:drill:zk=local> drop table dfs.donuts.donuts_json; + +-------+------------------------------+ + | ok | summary | + +-------+------------------------------+ + | true | Table [donuts_json] dropped | + +-------+------------------------------+ + 1 row selected (1.189 seconds) + +Drill returns the following error when the schema is not identified: + + 0: jdbc:drill:zk=local> drop table donuts_json; + + Error: PARSE ERROR: Root schema is immutable. Creating or dropping tables/views is not allowed in root schema.Select a schema using 'USE schema' command. + [Error Id: 8c42cb6a-27eb-48fd-b42a-671a6fb58c14 on 10.250.56.218:31010] (state=,code=0) + +###Example 2: Dropping a table created from a file +In the following example, the `donuts_json` table is removed from the `/tmp` workspace using the `DROP TABLE` command. This example assumes that the steps in the [Complete CTAS Example]({{ site.baseurl }}/docs/create-table-as-ctas/#complete-ctas-example) were already completed. + +Running an `ls` on the `/tmp` directory shows the `donuts_json` file. + + $ pwd + /tmp + $ cd donuts_json + $ ls + 0_0_0.json + $ more 0_0_0.json + { + "id" : "0001", + "type" : "donut", + "name" : "Cake", + "ppu" : 0.55 + } +Issuing `USE dfs.tmp` changes schema. + + 0: jdbc:drill:zk=local> use dfs.tmp; + +-------+-----------------------------------------+ + | ok | summary | + +-------+-----------------------------------------+ + | true | Default schema changed to [dfs.tmp] | + +-------+-----------------------------------------+ + 1 row selected (0.085 seconds) + +Running the `DROP TABLE` command removes the table from the schema. + + 0: jdbc:drill:zk=local> drop table donuts_json; + +-------+------------------------------+ + | ok | summary | + +-------+------------------------------+ + | true | Table [donuts_json] dropped | + +-------+------------------------------+ + 1 row selected (0.107 seconds) + +###Example 3: Dropping a table created as a directory +When you create a table that writes files to a directory, you can issue the `DROP TABLE` command against the table to remove the directory. All files and subdirectories are deleted. For example, the following `CTAS` command writes Parquet data from the `nation.parquet` file, installed with Drill, to the `/tmp/name_key` directory. + +Issue the `USE` command to change schema. + + 0: jdbc:drill:zk=local> USE dfs; + +Create a table using the `CTAS` command. + + 0: jdbc:drill:zk=local> CREATE TABLE tmp.`name_key` (N_NAME, N_NATIONKEY) AS SELECT N_NATIONKEY, N_NAME FROM dfs.`/Users/drilluser/apache-drill-1.2.0/sample-data/nation.parquet`; + +-----------+----------------------------+ + | Fragment | Number of records written | + +-----------+----------------------------+ + | 0_0 | 25 | + +-----------+----------------------------+ + Query the directory to see the data. + +Query the directory to see the data. + + 0: jdbc:drill:zk=local> select * from tmp.`name_key`; + +---------+-----------------+ + | N_NAME | N_NATIONKEY | + +---------+-----------------+ + | 0 | ALGERIA | + | 1 | ARGENTINA | + | 2 | BRAZIL | + | 3 | CANADA | + | 4 | EGYPT | + | 5 | ETHIOPIA | + | 6 | FRANCE | + | 7 | GERMANY | + | 8 | INDIA | + | 9 | INDONESIA | + | 10 | IRAN | + | 11 | IRAQ | + | 12 | JAPAN | + | 13 | JORDAN | + | 14 | KENYA | + | 15 | MOROCCO | + | 16 | MOZAMBIQUE | + | 17 | PERU | + | 18 | CHINA | + | 19 | ROMANIA | + | 20 | SAUDI ARABIA | + | 21 | VIETNAM | + | 22 | RUSSIA | + | 23 | UNITED KINGDOM | + | 24 | UNITED STATES | + +---------+-----------------+ + 25 rows selected (0.183 seconds) + +Issue the `DROP TABLE` command against the directory to remove the directory and deletes all files and subdirectories that existed within the directory. + + 0: jdbc:drill:zk=local> drop table name_key; + +-------+---------------------------+ + | ok | summary | + +-------+---------------------------+ + | true | Table [name_key] dropped | + +-------+---------------------------+ + 1 row selected (0.086 seconds) + +###Example 4: Dropping a table that does not exist +The following example shows the result of dropping a table that does not exist because it has already been dropped or it never existed. + + 0: jdbc:drill:zk=local> use use dfs.tmp; + +-------+--------------------------------------+ + | ok | summary | + +-------+--------------------------------------+ + | true | Default schema changed to [dfs.tmp] | + +-------+--------------------------------------+ + 1 row selected (0.289 seconds) + + 0: jdbc:drill:zk=local> drop table name_key; + + Error: VALIDATION ERROR: Table [name_key] not found + [Error Id: fc6bfe17-d009-421c-8063-d759d7ea2f4e on 10.250.56.218:31010] (state=,code=0) + +###Example 5: Dropping a table without permissions +The following example shows the result of dropping a table without appropriate permissions in the file system. + + 0: jdbc:drill:zk=local> drop table name_key; + + Error: PERMISSION ERROR: Unauthorized to drop table + [Error Id: 36f6b51a-786d-4950-a4a7-44250f153c55 on 10.10.30.167:31010] (state=,code=0) + +###Example 6: Dropping and querying a table concurrently + +The result of this scenario depends on the delta in time between one user dropping a table and another user issuing a query against the table. Results can also vary. In some instances the drop may succeed and the query fails completely or the query completes partially and then the table is dropped returning an exception in the middle of the query results. + +The following example shows the result of dropping a table and issuing a query against the table simultaneously. In this example, the table is dropped before the query can run against it. + +**User 1 issues the DROP TABLE command.** + + 0: jdbc:drill:zk=local> drop table name_key; + +-------+------------------------------+ + | ok | summary | + +-------+------------------------------+ + | true | Table [droptable34] dropped | + +-------+------------------------------+ + 1 row selected (12.35 seconds) + +**User 2 issues a query against the table.** + + 0: jdbc:drill:zk=local> select * from name_key; + + Error: SYSTEM ERROR: FileNotFoundException: Requested file does not exist. + Fragment 1:0 + [Error Id: 6e3c6a8d-8cfd-4033-90c4-61230af80573 on 10.10.30.167:31010] (state=,code=0) + +###Example 7: Dropping a table with different file formats +The following example shows the result of dropping a table when multiple file formats exists in the directory. In this scenario, the `sales_dir` table resides in the `dfs.sales` workspace and contains Parquet, CSV, and JSON files. + +Running `ls` on `sales_dir` shows that different file types exist in the directory. + + $ cd sales_dir/ + $ ls + 0_0_0.parquet sales_a.csv sales_b.json sales_c.parquet + +Issuing the `DROP TABLE` command on the directory results in an error. + + 0: jdbc:drill:zk=local> drop table dfs.sales.sales_dir; + + Error: VALIDATION ERROR: Table contains different file formats. + Drop Table is only supported for directories that contain homogeneous file formats consumable by Drill + [Error Id: 062f68c9-f2cd-4033-9b3d-182146a96904 on 10.250.56.218:31010] (state=,code=0) + + + + + + + http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-commands/055-drop-view.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/055-drop-view.md b/_docs/sql-reference/sql-commands/055-drop-view.md deleted file mode 100644 index 4e8b477..0000000 --- a/_docs/sql-reference/sql-commands/055-drop-view.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -title: "DROP VIEW" -parent: "SQL Commands" ---- - -The DROP VIEW command removes a view that was created in a workspace using the CREATE VIEW command. - -## Syntax - -The DROP VIEW command supports the following syntax: - - DROP VIEW [workspace.]view_name; - -## Usage Notes - -When you drop a view, all information about the view is deleted from the workspace in which it was created. DROP VIEW applies to the view only, not to the underlying data sources used to create the view. However, if you drop a view that another view is dependent on, you can no longer use the dependent view. If the underlying tables or views change after a view is created, you may want to drop and re-create the view. Alternatively, you can use the CREATE OR REPLACE VIEW syntax to update the view. - -## Example - -This example shows you some steps to follow when you want to drop a view in Drill using the DROP VIEW command. A workspace named “donuts” was created for the steps in this example. -Complete the following steps to drop a view in Drill: -Use the writable workspace from which the view was created. - - 0: jdbc:drill:zk=local> use dfs.donuts; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | Default schema changed to 'dfs.donuts' | - +------------+------------+ - -Use the DROP VIEW command to remove a view created in the current workspace. - - 0: jdbc:drill:zk=local> drop view mydonuts; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | View 'mydonuts' deleted successfully from 'dfs.donuts' schema | - +------------+------------+ - -Use the DROP VIEW command to remove a view created in another workspace. - - 0: jdbc:drill:zk=local> drop view dfs.tmp.yourdonuts; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | View 'yourdonuts' deleted successfully from 'dfs.tmp' schema | - +------------+------------+ http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-commands/056-drop-view.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/056-drop-view.md b/_docs/sql-reference/sql-commands/056-drop-view.md new file mode 100644 index 0000000..4e8b477 --- /dev/null +++ b/_docs/sql-reference/sql-commands/056-drop-view.md @@ -0,0 +1,47 @@ +--- +title: "DROP VIEW" +parent: "SQL Commands" +--- + +The DROP VIEW command removes a view that was created in a workspace using the CREATE VIEW command. + +## Syntax + +The DROP VIEW command supports the following syntax: + + DROP VIEW [workspace.]view_name; + +## Usage Notes + +When you drop a view, all information about the view is deleted from the workspace in which it was created. DROP VIEW applies to the view only, not to the underlying data sources used to create the view. However, if you drop a view that another view is dependent on, you can no longer use the dependent view. If the underlying tables or views change after a view is created, you may want to drop and re-create the view. Alternatively, you can use the CREATE OR REPLACE VIEW syntax to update the view. + +## Example + +This example shows you some steps to follow when you want to drop a view in Drill using the DROP VIEW command. A workspace named “donuts” was created for the steps in this example. +Complete the following steps to drop a view in Drill: +Use the writable workspace from which the view was created. + + 0: jdbc:drill:zk=local> use dfs.donuts; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | Default schema changed to 'dfs.donuts' | + +------------+------------+ + +Use the DROP VIEW command to remove a view created in the current workspace. + + 0: jdbc:drill:zk=local> drop view mydonuts; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | View 'mydonuts' deleted successfully from 'dfs.donuts' schema | + +------------+------------+ + +Use the DROP VIEW command to remove a view created in another workspace. + + 0: jdbc:drill:zk=local> drop view dfs.tmp.yourdonuts; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | View 'yourdonuts' deleted successfully from 'dfs.tmp' schema | + +------------+------------+ http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-commands/060-describe.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/060-describe.md b/_docs/sql-reference/sql-commands/060-describe.md deleted file mode 100644 index 837494c..0000000 --- a/_docs/sql-reference/sql-commands/060-describe.md +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: "DESCRIBE" -parent: "SQL Commands" ---- -The DESCRIBE command returns information about columns in a table or view. - -## Syntax - -The DESCRIBE command supports the following syntax: - - DESCRIBE [workspace.]table_name|view_name - -## Usage Notes - -You can issue the DESCRIBE command against views created in a workspace and -tables created in Hive and HBase. You can issue the DESCRIBE command -on a table or view from any schema. For example, if you are working in the -`dfs.myworkspace` schema, you can issue the DESCRIBE command on a view or -table in another schema. Currently, DESCRIBE does not support tables created -in a file system. - -Drill only supports SQL data types. Verify that all data types in an external -data source, such as Hive or HBase, map to supported data types in Drill. See -Drill Data Type Mapping for more information. - -## Example - -The following example demonstrates the steps that you can follow when you want -to use the DESCRIBE command to see column information for a view and for Hive -and HBase tables. - -Complete the following steps to use the DESCRIBE command: - - 1. Issue the USE command to switch to a particular schema. - - 0: jdbc:drill:zk=drilldemo:5181> use hive; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | Default schema changed to 'hive' | - +------------+------------+ - 1 row selected (0.025 seconds) - - 2. Issue the SHOW TABLES command to see the existing tables in the schema. - - 0: jdbc:drill:zk=drilldemo:5181> show tables; - +--------------+------------+ - | TABLE_SCHEMA | TABLE_NAME | - +--------------+------------+ - | hive.default | orders | - | hive.default | products | - +--------------+------------+ - 2 rows selected (0.438 seconds) - - 3. Issue the DESCRIBE command on a table. - - 0: jdbc:drill:zk=drilldemo:5181> describe orders; - +-------------+------------+-------------+ - | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | - +-------------+------------+-------------+ - | order_id | BIGINT | YES | - | month | VARCHAR | YES | - | purchdate | TIMESTAMP | YES | - | cust_id | BIGINT | YES | - | state | VARCHAR | YES | - | prod_id | BIGINT | YES | - | order_total | INTEGER | YES | - +-------------+------------+-------------+ - 7 rows selected (0.64 seconds) - - 4. Issue the DESCRIBE command on a table in another schema from the current schema. - - 0: jdbc:drill:zk=drilldemo:5181> describe hbase.customers; - +-------------+------------+-------------+ - | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | - +-------------+------------+-------------+ - | row_key | ANY | NO | - | address | (VARCHAR(1), ANY) MAP | NO | - | loyalty | (VARCHAR(1), ANY) MAP | NO | - | personal | (VARCHAR(1), ANY) MAP | NO | - +-------------+------------+-------------+ - 4 rows selected (0.671 seconds) - - 5. Issue the DESCRIBE command on a view in another schema from the current schema. - - 0: jdbc:drill:zk=drilldemo:5181> describe dfs.views.customers_vw; - +-------------+------------+-------------+ - | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | - +-------------+------------+-------------+ - | cust_id | BIGINT | NO | - | name | VARCHAR | NO | - | address | VARCHAR | NO | - | gender | VARCHAR | NO | - | age | VARCHAR | NO | - | agg_rev | VARCHAR | NO | - | membership | VARCHAR | NO | - +-------------+------------+-------------+ - 7 rows selected (0.403 seconds) - http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-conditional-expressions/010-case-expression.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-conditional-expressions/010-case-expression.md b/_docs/sql-reference/sql-conditional-expressions/010-case-expression.md index 90c7caa..9336b1c 100644 --- a/_docs/sql-reference/sql-conditional-expressions/010-case-expression.md +++ b/_docs/sql-reference/sql-conditional-expressions/010-case-expression.md @@ -25,7 +25,9 @@ Drill processes the CASE expression as follows: ## Example -The Drill installation includes sample files in the Drill classpath. You use the classpath (cp) storage plugin point to these files. One sample file, `employee.json`, contains some fictitious employee data that you query in this example. The employee having an ID of 99 is Elizabeth Horne. The employee having a ID of 100 is Mary Hunt. The example CASE statement gets the first name of the employee 99 and the last name of employee 100. Any other employee ID does not meet the condition; the ID is too high. +The Drill installation includes [`employee.json`]({{site.baseurl}}/docs/querying-json-files/) in the Drill classpath that this example queries. You use the classpath (cp) storage plugin point to this file. + +The employee having an ID of 99 is Elizabeth Horne. The employee having a ID of 100 is Mary Hunt. The example CASE statement gets the first name of the employee 99 and the last name of employee 100. Any other employee ID does not meet the condition; the ID is too high. USE cp; SELECT employee_id, http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-functions/020-data-type-conversion.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-functions/020-data-type-conversion.md b/_docs/sql-reference/sql-functions/020-data-type-conversion.md index b0eb6aa..ad7463a 100644 --- a/_docs/sql-reference/sql-functions/020-data-type-conversion.md +++ b/_docs/sql-reference/sql-functions/020-data-type-conversion.md @@ -383,7 +383,8 @@ First, you set the storage format to JSON. Next, you use the CREATE TABLE AS (CT ## STRING_BINARY function -Converts a VARBINARY value into a string that is its hexadecimal encoding. +Prints the bytes that are printable, and prints a hexadecimal +representation for bytes that are not printable. This function is modeled after the hbase utilities. ### STRING_BINARY Syntax @@ -391,7 +392,7 @@ Converts a VARBINARY value into a string that is its hexadecimal encoding. *expression* is a byte array, such as {(byte)0xca, (byte)0xfe, (byte)0xba, (byte)0xbe} -This function returns a hexadecimal string, such as `"\xca\xfe\xba\xbe"`. You can use this function with CONVERT_TO when you want to test the effects of a conversion. +You can use this function with CONVERT_TO when you want to test the effects of a conversion. ### STRING_BINARY Examples @@ -442,7 +443,7 @@ Converts a string that is the hexadecimal encoding of a sequence of bytes into a *expression* is a hexadecimal string, such as `"\xca\xfe\xba\xbe"`. -This function returns a byte array, such as {(byte)0xca, (byte)0xfe, (byte)0xba, (byte)0xbe}. You can use this function with CONVERT_FROM for readable results. +This function returns a byte array, such as {(byte)0xca, (byte)0xfe, (byte)0xba, (byte)0xbe}. ### BINARY_STRING Example @@ -454,7 +455,7 @@ Converts a VARBINARY type into a hexadecimal-encoded string. *expression* is a byte array, such as {(byte)0xca, (byte)0xfe, (byte)0xba, (byte)0xbe}. -This function returns a hexadecimal-encoded string, such as `"\xca\xfe\xba\xbe"`. You can use this function with CONVERT_TO for meaningful results. +This function returns a hexadecimal-encoded string, such as `"\xca\xfe\xba\xbe"`. You can use this function with CONVERT_FROM/TO for meaningful results. ### BINARY_STRING Examples http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md b/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md index f23da74..36e65cd 100644 --- a/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md +++ b/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md @@ -28,7 +28,7 @@ Returns the interval between two timestamps or subtracts a timestamp from midnig ### AGE Syntax - AGE (timestamp[, timestamp]) +`AGE (timestamp[, timestamp])` *timestamp* is the data and time formatted as shown in the following examples. @@ -47,7 +47,7 @@ Find the interval between midnight today, April 3, 2015, and June 13, 1957. +------------+ 1 row selected (0.064 seconds) -Find the interval between midnight today, May 21, 2015, and hire dates of employees 578 and 761 in the `employees.json` file. The file is installed with Drill and located in the Drill classpath. +Find the interval between midnight today, May 21, 2015, and hire dates of employees 578 and 761 in the [`employee.json`]({{site.baseurl}}/docs/querying-json-files/) file installed with Drill. The file is installed with Drill and located in the Drill classpath. SELECT AGE(CAST(hire_date AS TIMESTAMP)) FROM cp.`employee.json` where employee_id IN( '578','761'); +------------------+ @@ -75,13 +75,10 @@ Returns the sum of a date/time and a number of days/hours, or of a date/time and ### DATE_ADD Syntax - DATE_ADD(keyword literal, integer) - - DATE_ADD(keyword literal, interval expr) - - DATE_ADD(column, integer) - - DATE_ADD(column, interval expr) +`DATE_ADD(keyword literal, integer)` +`DATE_ADD(keyword literal, interval expr)` +`DATE_ADD(column, integer)` +`DATE_ADD(column, interval expr)` *keyword* is the word date, time, or timestamp. *literal* is a date, time, or timestamp literal. For example, a date in yyyy-mm-dd format enclosed in single quotation marks. @@ -197,7 +194,7 @@ Returns a field of a date, time, timestamp, or interval. ### DATE_PART Syntax - date_part(keyword, expression) +`date_part(keyword, expression)` *keyword* is year, month, day, hour, minute, or second enclosed in single quotation marks. *expression* is date, time, timestamp, or interval literal enclosed in single quotation marks. @@ -238,13 +235,10 @@ Returns the difference between a date/time and a number of days/hours, or betwee ### DATE_SUB Syntax - DATE_SUB(keyword literal, integer) - - DATE_SUB(keyword literal, interval expr) - - DATE_ADD(column, integer) - - DATE_SUB(column, interval expr) +`DATE_SUB(keyword literal, integer)` +`DATE_SUB(keyword literal, interval expr)` +`DATE_ADD(column, integer)` +`DATE_SUB(column, interval expr)` *keyword* is the word date, time, or timestamp. *literal* is a date, time, or timestamp literal. For example, a date in yyyy-mm-dd format enclosed in single quotation marks. @@ -431,7 +425,7 @@ Returns a component of a timestamp, time, date, or interval. ### EXTRACT Syntax - EXTRACT (extract_expression) +`EXTRACT (extract_expression)` *extract_expression* is: @@ -459,7 +453,6 @@ On the third day of the month, run the following function: At 8:00 am, extract the hour from the value of CURRENT_DATE. SELECT EXTRACT(hour FROM CURRENT_DATE) FROM (VALUES(1)); - +------------+ | EXPR$0 | +------------+ @@ -469,7 +462,6 @@ At 8:00 am, extract the hour from the value of CURRENT_DATE. What is the hour component of this time: 17:12:28.5? SELECT EXTRACT(hour FROM TIME '17:12:28.5') FROM (VALUES(1)); - +------------+ | EXPR$0 | +------------+ @@ -480,7 +472,6 @@ What is the hour component of this time: 17:12:28.5? What is the seconds component of this timestamp: 2001-02-16 20:38:40 SELECT EXTRACT(SECOND FROM TIMESTAMP '2001-02-16 20:38:40') FROM (VALUES(1)); - +------------+ | EXPR$0 | +------------+ @@ -527,9 +518,9 @@ Returns UNIX Epoch time, which is the number of seconds elapsed since January 1, ### UNIX_TIMESTAMP Syntax - UNIX_TIMESTAMP() - UNIX_TIMESTAMP(string date) - UNIX_TIMESTAMP(string date, string pattern) +`UNIX_TIMESTAMP()` +`UNIX_TIMESTAMP(string date)` +`UNIX_TIMESTAMP(string date, string pattern)` These functions perform the following operations, respectively: http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md index ce71e8a..87c45ab 100644 --- a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md +++ b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md @@ -10,18 +10,18 @@ Drill queries: **Function** | **Argument Type** | **Return Type** -------- | ------------- | ----------- -AVG(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVALYEAR or INTERVALDAY| DECIMAL for any integer-type argument, DOUBLE for a floating-point argument, otherwise the same as the argument data type +AVG(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL| DECIMAL for any integer-type argument, DOUBLE for a floating-point argument, otherwise the same as the argument data type COUNT(*)| _-_| BIGINT COUNT([DISTINCT] expression)| any| BIGINT MAX(expression)| BINARY, DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP| same as argument type MIN(expression)| BINARY, DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP| same as argument type -SUM(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVALDAY, or INTERVALYEAR| BIGINT for SMALLINT or INTEGER arguments, DECIMAL for BIGINT arguments, DOUBLE for floating-point arguments, otherwise the same as the argument data type +SUM(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL| BIGINT for SMALLINT or INTEGER arguments, DECIMAL for BIGINT arguments, DOUBLE for floating-point arguments, otherwise the same as the argument data type \* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. You can [enable the DECIMAL type](docs/supported-data-types/#enabling-the-decimal-type), but this is not recommended. AVG, COUNT, MIN, MAX, and SUM accept ALL and DISTINCT keywords. The default is ALL. -These examples of aggregate functions use the `cp` storage plugin to access a JSON file installed with Drill. By default, JSON reads numbers as double-precision floating point numbers. These examples assume that you are using the default option [all_text_mode]({{site.baseurl}}/docs/json-data-model/#handling-type-differences) set to false. +These examples of aggregate functions use the `cp` storage plugin to access a the [`employee.json`]({{site.baseurl}}/docs/querying-json-files/) file installed with Drill. By default, JSON reads numbers as double-precision floating point numbers. These examples assume that you are using the default option [all_text_mode]({{site.baseurl}}/docs/json-data-model/#handling-type-differences) set to false. ## AVG @@ -110,8 +110,8 @@ Returns the number of rows that match the given criteria. ### COUNT Syntax - SELECT COUNT([DISTINCT | ALL] column) FROM . . . - SELECT COUNT(*) FROM . . . +`SELECT COUNT([DISTINCT | ALL] column) FROM . . .` +`SELECT COUNT(*) FROM . . .` * column Returns the number of values of the specified column. @@ -203,7 +203,7 @@ Returns the total of a numeric column. ### SUM syntax -SUM(column) +`SUM(column)` ### Examples http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-window-functions/010-sql-window-functions-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-window-functions/010-sql-window-functions-introduction.md b/_docs/sql-reference/sql-window-functions/010-sql-window-functions-introduction.md old mode 100644 new mode 100755 index c328898..6b87af2 --- a/_docs/sql-reference/sql-window-functions/010-sql-window-functions-introduction.md +++ b/_docs/sql-reference/sql-window-functions/010-sql-window-functions-introduction.md @@ -85,7 +85,8 @@ Aggregate Ranking * CUME_DIST() -* DENSE_RANK() +* DENSE_RANK() +* NTILE() * PERCENT_RANK() * RANK() * ROW_NUMBER() @@ -125,7 +126,8 @@ Any of the following functions used with the OVER clause to provide a window spe * MAX() * MIN() * SUM() -* DENSE_RANK() +* DENSE_RANK() +* NTILE() * PERCENT_RANK() * RANK() * ROW_NUMBER() http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-window-functions/030-ranking-window-functions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-window-functions/030-ranking-window-functions.md b/_docs/sql-reference/sql-window-functions/030-ranking-window-functions.md old mode 100644 new mode 100755 index 90753df..07d087f --- a/_docs/sql-reference/sql-window-functions/030-ranking-window-functions.md +++ b/_docs/sql-reference/sql-window-functions/030-ranking-window-functions.md @@ -3,7 +3,7 @@ title: "Ranking Window Functions" parent: "SQL Window Functions" --- -Window functions operate on a set of rows and return a single value for each row from the underlying query. The OVER() clause differentiates window functions from other analytical and reporting functions. See [SQL Window Functions Introduction]({{site.baseurl}}/docs/sql-window-functions-introduction/). You can use ranking functions in Drill to return a ranking value for each row in a partition. +Window functions operate on a set of rows and return a single value for each row from the underlying query. The OVER() clause differentiates window functions from other analytical and reporting functions. See [SQL Window Functions Introduction]({{ site.baseurl }}/docs/sql-window-functions-introduction/). You can use ranking functions in Drill to return a ranking value for each row in a partition. The following table lists the ranking window functions with supported data types and descriptions: @@ -11,10 +11,13 @@ The following table lists the ranking window functions with supported data types |-----------------|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CUME_DIST() | DOUBLE PRECISION | The CUME_DIST() window function calculates the relative rank of the current row within a window partition: (number of rows preceding or peer with current row) / (total rows in the window partition) | | DENSE_RANK() | BIGINT | The DENSE_RANK () window function determines the rank of a value in a group of values based on the ORDER BY expression and the OVER clause. Each value is ranked within its partition. Rows with equal values receive the same rank. There are no gaps in the sequence of ranked values if two or more rows have the same rank. | +| NTILE() | INTEGER | The NTILE window function divides the rows for each window partition, as equally as possible, into a specified number of ranked groups. The NTILE window function requires the ORDER BY clause in the OVER clause. | | PERCENT_RANK() | DOUBLE PRECISION | The PERCENT_RANK () window function calculates the percent rank of the current row using the following formula: (x - 1) / (number of rows in window partition - 1) where x is the rank of the current row. | | RANK() | BIGINT | The RANK window function determines the rank of a value in a group of values. The ORDER BY expression in the OVER clause determines the value. Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers. For example, if two rows are ranked 1, the next rank is 3. The DENSE_RANK window function differs in that no gaps exist if two or more rows tie. | | ROW_NUMBER() | BIGINT | The ROW_NUMBER window function determines the ordinal number of the current row within its partition. The ORDER BY expression in the OVER clause determines the number. Each value is ordered within its partition. Rows with equal values for the ORDER BY expressions receive different row numbers nondeterministically. | + + ## Syntax window_function () OVER clause @@ -23,7 +26,7 @@ The following table lists the ranking window functions with supported data types ## Arguments -*window_function* +*window\_function* One of the following supported ranking functions: CUME_DIST(), DENSE_RANK(), PERCENT_RANK(), RANK(), ROW_NUMBER() @@ -31,12 +34,12 @@ CUME_DIST(), DENSE_RANK(), PERCENT_RANK(), RANK(), ROW_NUMBER() The functions do not take arguments, however the empty parentheses are required. OVER clause -The window clauses for the function. The OVER clause cannot contain an explicit frame specification, but must include an ORDER BY clause. See [Window Function Syntax]({{site.baseurl}}/docs/sql-window-functions-introduction/#syntax) for OVER clause syntax. +The window clauses for the function. The OVER clause cannot contain an explicit frame specification, but must include an ORDER BY clause. See [Window Function Syntax]({{ site.baseurl }}/docs/sql-window-functions-introduction/#syntax) for OVER clause syntax. ## Examples -The following examples show queries that use each of the ranking window functions in Drill. See [Window Functions Examples]({{site.baseurl}}/docs/sql-window-functions-examples/) for information about the data and setup for these examples. +The following examples show queries that use each of the ranking window functions in Drill. See [Window Functions Examples]({{ site.baseurl }}/docs/sql-window-functions-examples/) for information about the data and setup for these examples. ### CUME_DIST() @@ -78,7 +81,48 @@ The following query uses the DENSE_RANK() window function to rank the employee s | 1 | Ferris Brown | 19745 | 7 | | 1 | Noel Meyer | 19745 | 7 | +------------+-----------------+--------+------------+ - 10 rows selected (0.198 seconds) + 10 rows selected (0.198 seconds) + +###NTILE() + +The following example uses the NTILE window function to divide the Q1 sales into five groups and list the sales in ascending order. + + select emp_mgr, sales, ntile(5) over(order by sales) as ntilerank from q1_sales; + +-----------------+--------+------------+ + | emp_mgr | sales | ntilerank | + +-----------------+--------+------------+ + | Kari Phelps | 8227 | 1 | + | Rich Hernandez | 9308 | 1 | + | Kari Phelps | 9710 | 2 | + | Rich Hernandez | 12369 | 2 | + | Mike Palomino | 13181 | 3 | + | Rich Hernandez | 15427 | 3 | + | Kari Phelps | 15547 | 4 | + | Mike Palomino | 16233 | 4 | + | Dan Brodi | 19745 | 5 | + | Mike Palomino | 23176 | 5 | + +-----------------+--------+------------+ + 10 rows selected (0.149 seconds) + +The following example partitions sales by dealer_id and uses the NTILE window function to divide the rows in each partition into three groups; dealer 1 had one remainder which was added to the first group. + + select emp_mgr, dealer_id, sales, ntile(3) over(partition by dealer_id order by sales) as ntilerank from q1_sales; + +-----------------+------------+--------+------------+ + | emp_mgr | dealer_id | sales | ntilerank | + +-----------------+------------+--------+------------+ + | Kari Phelps | 1 | 8227 | 1 | + | Kari Phelps | 1 | 9710 | 1 | + | Kari Phelps | 1 | 15547 | 2 | + | Dan Brodi | 1 | 19745 | 3 | + | Mike Palomino | 2 | 13181 | 1 | + | Mike Palomino | 2 | 16233 | 2 | + | Mike Palomino | 2 | 23176 | 3 | + | Rich Hernandez | 3 | 9308 | 1 | + | Rich Hernandez | 3 | 12369 | 2 | + | Rich Hernandez | 3 | 15427 | 3 | + +-----------------+------------+--------+------------+ + 10 rows selected (0.312 seconds) + ### PERCENT_RANK() http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/sql-reference/sql-window-functions/035-value-window-functions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-window-functions/035-value-window-functions.md b/_docs/sql-reference/sql-window-functions/035-value-window-functions.md new file mode 100755 index 0000000..1f64023 --- /dev/null +++ b/_docs/sql-reference/sql-window-functions/035-value-window-functions.md @@ -0,0 +1,131 @@ +--- +title: "Value Window Functions" +parent: "SQL Window Functions" +--- +Window functions operate on a set of rows and return a single value for each row from the underlying query. The OVER() clause differentiates window functions from other analytical and reporting functions. See [SQL Window Functions Introduction]({{ site.baseurl }}/docs/sql-window-functions-introduction/). + +The following table lists the value window functions with supported data types and descriptions: + +| Window Function | Argument Type | Return Type | Description | +|-----------------|--------------------------------|------------------------------|------------------------------------------------------------------------------------------------------------------------------------| +| LAG() | Any supported Drill data types | Same as the expression type | The LAG() window function returns the value for the row before the current row in a partition. If no row exists, null is returned. | +| LEAD() | Any supported Drill data types | Same as the expression type | The LEAD() window function returns the value for the row after the current row in a partition. If no row exists, null is returned. | +| FIRST_VALUE | Any supported Drill data types | Same as the expression type | The FIRST_VALUE window function returns the value of the specified expression with respect to the first row in the window frame. | +| LAST_VALUE | Any supported Drill data types | Same as the expression type | The LAST_VALUE window function returns the value of the specified expression with respect to the last row in the window frame. | + +##Syntax + +###LAG | LEAD + + LAG | LEAD + ( expression ) + OVER ( [ PARTITION BY expr_list ] [ ORDER BY order_list ] ) + +###FIRST\_VALUE | LAST_VALUE + + FIRST_VALUE | LAST_VALUE + ( expression ) OVER + ( [ PARTITION BY expr_list ] [ ORDER BY order_list ][ frame_clause ] ) + +##Arguments + +*expression* +The target column or expression that the function operates on. + +OVER +Specifies the window partitioning and ordering. The OVER clause cannot contain a window frame specification. + +PARTITION BY *expr_list* +Defines the window for the window function in terms of one or more expressions. + +ORDER BY *order_list* +Sorts the rows within each partition. If PARTITION BY is not specified, ORDER BY uses the entire table. + +*frame_clause* +The frame clause refines the set of rows in a function's window, including or excluding sets of rows within the ordered result. The frame clause consists of the ROWS or RANGE keyword and associated specifiers. + +##Examples +The following examples show queries that use each of the value window functions in Drill. + +###LAG() +The following example uses the LAG window function to show the quantity of records sold to the Tower Records customer with customer ID 8 and the dates that customer 8 purchased records. To compare each sale with the previous sale for customer 8, the query returns the previous quantity sold for each sale. Since there is no purchase before 1976-01-25, the first previous quantity sold value is null. Note that the term "date" in the query is enclosed in back ticks because it is a reserved keyword in Drill. + + select cust_id, `date`, qty_sold, lag(qty_sold,1) over (order by cust_id, `date`) as prev_qtysold from sales where cust_id = 8 order by cust_id, `date`; + + +----------+-------------+-----------+---------------+ + | cust_id | date | qty_sold | prev_qtysold | + +----------+-------------+-----------+---------------+ + | 8 | 1976-01-25 | 2 | null | + | 8 | 1981-02-04 | 5 | 2 | + | 8 | 1982-08-09 | 2 | 5 | + | 8 | 1983-02-12 | 1 | 2 | + | 8 | 1984-02-10 | 9 | 1 | + +----------+-------------+-----------+---------------+ + 5 rows selected (0.331 seconds) + +###LEAD() +The following example uses the LEAD window function to provide the commission for concert tickets with show ID 172 and the next commission for subsequent ticket sales. Since there is no commission after 40.00, the last next_comm value is null. Note that the term "date" in the query is enclosed in back ticks because it is a reserved keyword in Drill. + + select show_id, `date`, commission, lead(commission,1) over (order by `date`) as next_comm from commission where show_id = 172; + +----------+-------------+-------------+------------+ + | show_id | date | commission | next_comm | + +----------+-------------+-------------+------------+ + | 172 | 1979-01-01 | 29.20 | 29.50 | + | 172 | 1979-01-01 | 29.50 | 8.25 | + | 172 | 1979-01-01 | 8.25 | 15.50 | + | 172 | 1979-01-01 | 15.50 | 10.25 | + | 172 | 1979-01-01 | 10.25 | 4.40 | + | 172 | 1979-01-01 | 4.40 | 80.20 | + | 172 | 1979-01-01 | 80.20 | 90.10 | + | 172 | 1979-01-02 | 90.10 | 25.50 | + | 172 | 1979-01-02 | 25.50 | 50.00 | + | 172 | 1979-01-02 | 50.00 | 20.20 | + | 172 | 1979-01-02 | 20.20 | 40.00 | + | 172 | 1979-01-02 | 40.00 | null | + +----------+-------------+-------------+------------+ + 12 rows selected (0.241 seconds) + +###FIRST_VALUE() +The following example uses the FIRST_VALUE window function to identify the employee with the lowest sales for each dealer in Q1: + + select emp_name, dealer_id, sales, first_value(sales) over (partition by dealer_id order by sales) as dealer_low from q1_sales; + +-----------------+------------+--------+-------------+ + | emp_name | dealer_id | sales | dealer_low | + +-----------------+------------+--------+-------------+ + | Raphael Hull | 1 | 8227 | 8227 | + | Jack Salazar | 1 | 9710 | 8227 | + | Ferris Brown | 1 | 19745 | 8227 | + | Noel Meyer | 1 | 19745 | 8227 | + | Haviva Montoya | 2 | 9308 | 9308 | + | Beverly Lang | 2 | 16233 | 9308 | + | Kameko French | 2 | 16233 | 9308 | + | May Stout | 3 | 9308 | 9308 | + | Abel Kim | 3 | 12369 | 9308 | + | Ursa George | 3 | 15427 | 9308 | + +-----------------+------------+--------+-------------+ + 10 rows selected (0.299 seconds) + + +###LAST_VALUE() +The following example uses the LAST_VALUE window function to identify the last car sale each employee made at each dealership in 2013: + + select emp_name, dealer_id, sales, `year`, last_value(sales) over (partition by emp_name order by `year`) as last_sale from emp_sales where `year` = 2013; + +-----------------+------------+--------+-------+------------+ + | emp_name | dealer_id | sales | year | last_sale | + +-----------------+------------+--------+-------+------------+ + | Beverly Lang | 2 | 5324 | 2013 | 5324 | + | Ferris Brown | 1 | 22003 | 2013 | 22003 | + | Haviva Montoya | 2 | 6345 | 2013 | 13100 | + | Haviva Montoya | 2 | 13100 | 2013 | 13100 | + | Kameko French | 2 | 7540 | 2013 | 7540 | + | May Stout | 2 | 4924 | 2013 | 15000 | + | May Stout | 2 | 8000 | 2013 | 15000 | + | May Stout | 2 | 15000 | 2013 | 15000 | + | Noel Meyer | 1 | 13314 | 2013 | 13314 | + | Raphael Hull | 1 | -4000 | 2013 | 14000 | + | Raphael Hull | 1 | 14000 | 2013 | 14000 | + | Ursa George | 1 | 10865 | 2013 | 10865 | + +-----------------+------------+--------+-------+------------+ + 12 rows selected (0.284 seconds) + + http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/tutorials/020-drill-in-10-minutes.md ---------------------------------------------------------------------- diff --git a/_docs/tutorials/020-drill-in-10-minutes.md b/_docs/tutorials/020-drill-in-10-minutes.md index ed2ca25..d855b2e 100755 --- a/_docs/tutorials/020-drill-in-10-minutes.md +++ b/_docs/tutorials/020-drill-in-10-minutes.md @@ -5,7 +5,7 @@ description: Get started with Drill in 10 minutes or less. --- ## Objective -Use Apache Drill to query sample data in 10 minutes. For simplicity, you’ll +Use Apache Drill to query sample data in 10 minutes. For simplicity, you run Drill in _embedded_ mode rather than _distributed_ mode to try out Drill without having to perform any setup tasks. @@ -52,9 +52,9 @@ Complete the following steps to install Drill: 3. Copy the downloaded file to the directory where you want to install Drill. -4. Extract the contents of the Drill tar.gz file. Use sudo if necessary: +4. Extract the contents of the Drill .tar.gz file. Use sudo if necessary: - `tar -xvzf apache-drill-1.1.0.tar.gz` + `tar -xvzf <.tar.gz file name>` The extraction process creates an installation directory containing the Drill software. @@ -65,13 +65,13 @@ Start Drill in embedded mode using the `drill-embedded` command: 1. Navigate to the Drill installation directory. For example: - `cd apache-drill-1.1.0` + `cd apache-drill-` 2. Issue the following command to launch Drill in embedded mode: `bin/drill-embedded` - The message of the day followed by the `0: jdbc:drill:zk=local>` prompt appears. + The message of the day followed by the [`0: jdbc:drill:zk=local>` prompt]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/#about-the-drill-prompt) appears. At this point, you can [submit queries]({{site.baseurl}}/docs/drill-in-10-minutes#query-sample-data) to Drill. @@ -80,27 +80,25 @@ Start Drill in embedded mode using the `drill-embedded` command: You can install Drill on Windows 7 or 8. First, set the JAVA_HOME environment variable, and then install Drill. Complete the following steps to install Drill: 1. Click the following link to download the latest version of Apache Drill: [http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz](http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz) -2. Move the `apache-drill-1.1.0.tar.gz` file to a directory where you want to install Drill. -3. Unzip the `TAR.GZ` file using a third-party tool. If the tool you use does not unzip the TAR file as well as the `TAR.GZ` file, unzip the `apache-drill-1.1.0.tar` to extract the Drill software. The extraction process creates the installation directory named apache-drill-1.1.0 containing the Drill software. For example: - ![drill install dir]({{ site.baseurl }}/docs/img/drill-directory.png) +2. Move the `apache-drill-.tar.gz` file to a directory where you want to install Drill. +3. Unzip the `TAR.GZ` file using a third-party tool. If the tool you use does not unzip the TAR file as well as the `TAR.GZ` file, unzip the `apache-drill-.tar` to extract the Drill software. The extraction process creates the installation directory named apache-drill- containing the Drill software. At this point, you can start Drill. ## Start Drill on Windows Start Drill by running the sqlline.bat file and typing a connection string, as shown in the following procedure. The `zk=local` in the connection string means the local node is the ZooKeeper node: -Start the Drill shell using the **sqlline command**. The `zk=local` means the local node is the ZooKeeper node. Complete the following steps to launch the Drill shell: +Start the Drill shell using the **sqlline command**. Complete the following steps to launch the Drill shell: 1. Open Command Prompt. -2. Open the apache-drill-1.1.0 folder. For example: - ``cd apache-drill-1.1.0`` +2. Open the apache-drill- folder. 3. Go to the bin directory. For example: ``cd bin`` 4. Type the following command on the command line: ``sqlline.bat -u "jdbc:drill:zk=local"`` ![drill install dir]({{ site.baseurl }}/docs/img/sqlline1.png) -At this point, you can [run queries]({{ site.baseurl }}/docs/drill-in-10-minutes#query-sample-data). +The `zk=local` means the local node is the ZooKeeper node. At this point, you can [run queries]({{ site.baseurl }}/docs/drill-in-10-minutes#query-sample-data). ## Stopping Drill @@ -110,19 +108,17 @@ Issue the following command when you want to exit the Drill shell: ## Query Sample Data -Your Drill installation includes a `sample-data` directory with JSON and +At the root of the Drill installation, a `sample-data` directory includes JSON and Parquet files that you can query. The default `dfs` storage plugin configuration represents the local file system on your machine when you install Drill in embedded mode. For more information about storage plugin configuration, refer to [Storage Plugin Registration]({{ site.baseurl }}/docs/connect-a-data-source-introduction). -Use SQL syntax to query the sample `JSON` and `Parquet` files in the `sample-data` directory on your local file system. +Use SQL to query the sample `JSON` and `Parquet` files in the `sample-data` directory on your local file system. ### Querying a JSON File -A sample JSON file, `employee.json`, contains fictitious employee data. - -To view the data in the `employee.json` file, submit the following SQL query -to Drill, using the [cp (classpath) storage plugin]({{site.baseurl}}/docs/storage-plugin-registration/) to point to the file. +A sample JSON file, [`employee.json`]({{site.baseurl}}/docs/querying-json-files/), contains fictitious employee data. To view the data in the `employee.json` file, submit the following SQL query +to Drill, using the [cp (classpath) storage plugin]({{site.baseurl}}/docs/storage-plugin-registration/) configuration to point to the file. `0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 3;` @@ -148,10 +144,7 @@ If you followed the Apache Drill in 10 Minutes instructions to install Drill in embedded mode, the path to the parquet file varies between operating systems. -{% include startnote.html %}Substitute your installation path and the Drill version in the angle-bracketed locations when you enter the query.{% include endnote.html %} - -To view the data in the `region.parquet` file, issue the query appropriate for -your operating system: +To view the data in the `region.parquet` file, use the actual path to your Drill installation to construct this query: ``SELECT * FROM dfs.`/apache-drill-/sample-data/region.parquet`;`` @@ -170,20 +163,14 @@ The query returns the following results: #### Nation File -If you followed the Apache Drill in 10 Minutes instructions to install Drill -in embedded mode, the path to the parquet file varies between operating -systems. - -{% include startnote.html %}Substitute your installation path and the Drill version in the angle-bracketed locations when you enter the query{% include endnote.html %} - -To view the data in the `nation.parquet` file, issue the query appropriate for -your operating system: +The path to the parquet file varies between operating +systems. Use the actual path to your Drill installation to construct this query: ``SELECT * FROM dfs.`/apache-drill-/sample-data/nation.parquet`;`` The query returns the following results: - SELECT * FROM dfs.`Users/khahn/drill/apache-drill-1.1.0-SNAPSHOT/sample-data/nation.parquet`; + SELECT * FROM dfs.`Users/drilluser/apache-drill-1.1.0/sample-data/nation.parquet`; +--------------+-----------------+--------------+-----------------------+ | N_NATIONKEY | N_NAME | N_REGIONKEY | N_COMMENT | +--------------+-----------------+--------------+-----------------------+ @@ -217,13 +204,7 @@ The query returns the following results: ## Summary -Now, you have been introduced to Apache Drill, which supports nested data, schema-less execution, and decentralized metadata. To summarize, you have completed the following tasks: - - * Downloaded and installed Apache Drill. - * Started Drill in embedded mode. - * Queried the sample JSON file, `employee.json`, to view its data. - * Queried the sample `region.parquet` file to view its data. - * Queried the sample `nation.parquet` file to view its data. +Apache Drill supports nested data, schema-less execution, and decentralized metadata. At this point, you know how to create a simple query on a JSON or Parquet file. ## Next Steps @@ -237,5 +218,4 @@ Now that you have an idea about what Drill can do, you might want to: ## More Information -For more information about Apache Drill, explore the [Apache Drill -web site](http://drill.apache.org). \ No newline at end of file +For more information about Apache Drill, explore the [Apache Drill web site](http://drill.apache.org). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/43d6dd1d/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md ---------------------------------------------------------------------- diff --git a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md index 4a0b6aa..6139b9c 100644 --- a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md +++ b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md @@ -20,7 +20,7 @@ example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge) ## Installing and Starting Drill -### Step 1: Download Apache Drill onto your local machine +### Download Apache Drill onto your local machine To experiment with Drill locally, follow the installation instructions in [Drill in 10 Minutes]({{site.baseurl}}/docs/drill-in-10-minutes/).