Mailing-List: contact commits-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: commits@drill.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: bridgetb@apache.org
To: commits@drill.apache.org
Date: Wed, 25 Nov 2015 22:03:16 -0000
Message-Id: <2d3e386116dc490185fe8f96caef4e6a@git.apache.org>
In-Reply-To: <6b23d18928664538ae260ceb1af9fdb8@git.apache.org>
References: <6b23d18928664538ae260ceb1af9fdb8@git.apache.org>
Subject: [28/31] drill git commit: tweak Jason's pull request

tweak Jason's pull request

tweak Jason's migration writeup

wordsmith migration note

fix last sentence

minor edits

editorial

shorten title

typos

move xref out of note


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/1ae0e4eb
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/1ae0e4eb
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/1ae0e4eb

Branch: refs/heads/gh-pages
Commit: 1ae0e4ebca6f8d0bbd42fb6d2b1c6d8c1583aed0
Parents: d31e2ec
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Tue Nov 24 18:24:09 2015 -0800
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Wed Nov 25 10:39:15 2015 -0800

----------------------------------------------------------------------
 .../plugins/080-rdbms-storage-plugin.md                   |  3 ++-
 _docs/performance-tuning/020-partition-pruning.md         |  9 +++++++++
 .../sql-reference/sql-commands/035-partition-by-clause.md | 10 +++++++---
 3 files changed, 18 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
index 570aaf1..ed7bf7b 100644
--- a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
+++ b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
@@ -27,7 +27,8 @@ To configure the JDBC storage plugin:
 1. On the Storage tab, enter a name in **New Storage Plugin**. For example, enter `myplugin`.
    Each configuration registered with Drill must have a distinct name. Names are case-sensitive.  
 
-    {% include startnote.html %}The URL differs depending on your installation and configuration. See the [example configurations](#Example-Configurations) below for examples.{% include endnote.html %}  
+    {% include startnote.html %}The URL differs depending on your installation and configuration.{% include endnote.html %}  
+    See the [example configurations](#Example-Configurations) below for examples.  
 1. Click **Create**.  
 1. In Configuration, set the required properties using JSON formatting as shown in the following example. Change the properties to match your environment.  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/performance-tuning/020-partition-pruning.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/020-partition-pruning.md b/_docs/performance-tuning/020-partition-pruning.md
index c86620c..26f681f 100755
--- a/_docs/performance-tuning/020-partition-pruning.md
+++ b/_docs/performance-tuning/020-partition-pruning.md
@@ -7,6 +7,15 @@ Partition pruning is a performance optimization that limits the number of files
  
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O.
 
+## Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3
+Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet data that you generated in Drill 1.1 or 1.2 before attempting to use the data with Drill 1.3 partition pruning.  This migration is mandatory because Parquet data generated by Drill 1.1 and 1.2 must be marked as Drill-generated, as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070). 
+
+Drill 1.3 fixes a bug to accurately process Parquet files produced by other tools, such as Pig and Hive. The bug fix eliminated the risk of inaccurate metadata that could cause incorrect results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields accurate results. Drill-generated Parquet files, regardless of the Drill release, contain accurate metadata.
+
+After using the drill-upgrade tool to migrate your partitioned, pre-1.3 Parquet data, Drill can distinguish these files from those generated by other tools, such as Hive and Pig. Use the migration tool only on files generated by Drill. 
+
+To partition and query Parquet files generated from other tools, use Drill to read and rewrite the files and metadata using the CTAS command with the PARTITION BY clause. Alternatively, use the tool that generated the original files to regenerate Parquet 1.8 or later files.
+
 ## How to Partition Data
 
 In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required to take advantage of partition pruning. Write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/) clause in the CTAS statement. 

http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/sql-reference/sql-commands/035-partition-by-clause.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index 20cd6a3..8ec97e3 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -4,13 +4,17 @@ parent: "SQL Commands"
 ---
 The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/) to improve performance when you query the data. (Drill 1.1.0)
 
-Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten to work with partition pruning in 1.3 and beyond, information on the simple migration process can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this migration process came out of a bug fix included in the 1.3 release to accurately process parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata that could cause inaccurate
+{% include startnote.html %}Partitioned data generated in Drill 1.1-1.2 needs to be migrated to Drill 1.3 before attempting to use the data.{% include endnote.html %}
+
+See [Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3]({{site.baseurl}}/docs/partition-pruning/#migrating-partitioned-parquet-data-from-drill-1-1-1-2-to-drill-1-3).
+
+<!-- Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten to work with partition pruning in 1.3 and beyond, information on the simple migration process can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this migration process came out of a bug fix included in the 1.3 release to accurately process parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata that could cause inaccurate
 results. Drill results have been accurate on files it created, and the files all contain accurate metadata, the migration tool simply marks these files as having been produced by Drill. Unfortunately without this migration we cannot reliably tell them apart from files produced by other tools. The migration tool should only be used on files produced by Drill, not those produced with other software products. Data from other tools will need to be read in and completely rewritten to generate
-accurate metadata. This can be done using Drill or whatever tool originally produced them, as long as it is using a recent version of parquet (1.8 or later).
+accurate metadata. This can be done using Drill or whatever tool originally produced them, as long as it is using a recent version of parquet (1.8 or later). -->
 
 ## Syntax
 
-     [ PARTITION BY ( column_name[, . . .] ) ]
+`[ PARTITION BY ( column_name[, . . .] ) ]`
 
 The PARTITION BY clause partitions the data by the first column_name, and then subpartitions the data by the next column_name, if there is one, and so on.