Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D060A1819D for ; Wed, 25 Nov 2015 22:02:50 +0000 (UTC) Received: (qmail 63680 invoked by uid 500); 25 Nov 2015 22:02:50 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 63617 invoked by uid 500); 25 Nov 2015 22:02:50 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 62777 invoked by uid 99); 25 Nov 2015 22:02:50 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Nov 2015 22:02:50 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 236E2E00BB; Wed, 25 Nov 2015 22:02:50 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: bridgetb@apache.org To: commits@drill.apache.org Date: Wed, 25 Nov 2015 22:03:16 -0000 Message-Id: <2d3e386116dc490185fe8f96caef4e6a@git.apache.org> In-Reply-To: <6b23d18928664538ae260ceb1af9fdb8@git.apache.org> References: <6b23d18928664538ae260ceb1af9fdb8@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [28/31] drill git commit: tweak Jason's pull request tweak Jason's pull request tweak Jason's migration writeup wordsmith migration note fix last sentence minor edits editorial shorten title typos move xref out of note Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/1ae0e4eb Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/1ae0e4eb Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/1ae0e4eb Branch: refs/heads/gh-pages Commit: 1ae0e4ebca6f8d0bbd42fb6d2b1c6d8c1583aed0 Parents: d31e2ec Author: Kristine Hahn Authored: Tue Nov 24 18:24:09 2015 -0800 Committer: Kristine Hahn Committed: Wed Nov 25 10:39:15 2015 -0800 ---------------------------------------------------------------------- .../plugins/080-rdbms-storage-plugin.md | 3 ++- _docs/performance-tuning/020-partition-pruning.md | 9 +++++++++ .../sql-reference/sql-commands/035-partition-by-clause.md | 10 +++++++--- 3 files changed, 18 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md index 570aaf1..ed7bf7b 100644 --- a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md +++ b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md @@ -27,7 +27,8 @@ To configure the JDBC storage plugin: 1. On the Storage tab, enter a name in **New Storage Plugin**. For example, enter `myplugin`. Each configuration registered with Drill must have a distinct name. Names are case-sensitive. - {% include startnote.html %}The URL differs depending on your installation and configuration. See the [example configurations](#Example-Configurations) below for examples.{% include endnote.html %} + {% include startnote.html %}The URL differs depending on your installation and configuration.{% include endnote.html %} + See the [example configurations](#Example-Configurations) below for examples. 1. Click **Create**. 1. In Configuration, set the required properties using JSON formatting as shown in the following example. Change the properties to match your environment. http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/performance-tuning/020-partition-pruning.md ---------------------------------------------------------------------- diff --git a/_docs/performance-tuning/020-partition-pruning.md b/_docs/performance-tuning/020-partition-pruning.md index c86620c..26f681f 100755 --- a/_docs/performance-tuning/020-partition-pruning.md +++ b/_docs/performance-tuning/020-partition-pruning.md @@ -7,6 +7,15 @@ Partition pruning is a performance optimization that limits the number of files The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O. +## Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3 +Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet data that you generated in Drill 1.1 or 1.2 before attempting to use the data with Drill 1.3 partition pruning. This migration is mandatory because Parquet data generated by Drill 1.1 and 1.2 must be marked as Drill-generated, as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070). + +Drill 1.3 fixes a bug to accurately process Parquet files produced by other tools, such as Pig and Hive. The bug fix eliminated the risk of inaccurate metadata that could cause incorrect results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields accurate results. Drill-generated Parquet files, regardless of the Drill release, contain accurate metadata. + +After using the drill-upgrade tool to migrate your partitioned, pre-1.3 Parquet data, Drill can distinguish these files from those generated by other tools, such as Hive and Pig. Use the migration tool only on files generated by Drill. + +To partition and query Parquet files generated from other tools, use Drill to read and rewrite the files and metadata using the CTAS command with the PARTITION BY clause. Alternatively, use the tool that generated the original files to regenerate Parquet 1.8 or later files. + ## How to Partition Data In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required to take advantage of partition pruning. Write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/) clause in the CTAS statement. http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/sql-reference/sql-commands/035-partition-by-clause.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md index 20cd6a3..8ec97e3 100644 --- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md +++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md @@ -4,13 +4,17 @@ parent: "SQL Commands" --- The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/) to improve performance when you query the data. (Drill 1.1.0) -Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten to work with partition pruning in 1.3 and beyond, information on the simple migration process can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this migration process came out of a bug fix included in the 1.3 release to accurately process parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata that could cause inaccurate +{% include startnote.html %}Partitioned data generated in Drill 1.1-1.2 needs to be migrated to Drill 1.3 before attempting to use the data.{% include endnote.html %} + +See [Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3]({{site.baseurl}}/docs/partition-pruning/#migrating-partitioned-parquet-data-from-drill-1-1-1-2-to-drill-1-3). + + ## Syntax - [ PARTITION BY ( column_name[, . . .] ) ] +`[ PARTITION BY ( column_name[, . . .] ) ]` The PARTITION BY clause partitions the data by the first column_name, and then subpartitions the data by the next column_name, if there is one, and so on.