drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [28/31] drill git commit: tweak Jason's pull request
Date Wed, 25 Nov 2015 22:03:16 GMT
tweak Jason's pull request

tweak Jason's migration writeup

wordsmith migration note

fix last sentence

minor edits

editorial

shorten title

typos

move xref out of note


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/1ae0e4eb
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/1ae0e4eb
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/1ae0e4eb

Branch: refs/heads/gh-pages
Commit: 1ae0e4ebca6f8d0bbd42fb6d2b1c6d8c1583aed0
Parents: d31e2ec
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Tue Nov 24 18:24:09 2015 -0800
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Wed Nov 25 10:39:15 2015 -0800

----------------------------------------------------------------------
 .../plugins/080-rdbms-storage-plugin.md                   |  3 ++-
 _docs/performance-tuning/020-partition-pruning.md         |  9 +++++++++
 .../sql-reference/sql-commands/035-partition-by-clause.md | 10 +++++++---
 3 files changed, 18 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
index 570aaf1..ed7bf7b 100644
--- a/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
+++ b/_docs/connect-a-data-source/plugins/080-rdbms-storage-plugin.md
@@ -27,7 +27,8 @@ To configure the JDBC storage plugin:
 1. On the Storage tab, enter a name in **New Storage Plugin**. For example, enter `myplugin`.
    Each configuration registered with Drill must have a distinct name. Names are case-sensitive.
 
 
-    {% include startnote.html %}The URL differs depending on your installation and configuration.
See the [example configurations](#Example-Configurations) below for examples.{% include endnote.html
%}  
+    {% include startnote.html %}The URL differs depending on your installation and configuration.{%
include endnote.html %}  
+    See the [example configurations](#Example-Configurations) below for examples.  
 1. Click **Create**.  
 1. In Configuration, set the required properties using JSON formatting as shown in the following
example. Change the properties to match your environment.  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/performance-tuning/020-partition-pruning.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/020-partition-pruning.md b/_docs/performance-tuning/020-partition-pruning.md
index c86620c..26f681f 100755
--- a/_docs/performance-tuning/020-partition-pruning.md
+++ b/_docs/performance-tuning/020-partition-pruning.md
@@ -7,6 +7,15 @@ Partition pruning is a performance optimization that limits the number of
files
  
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
 
+## Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3
+Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet
data that you generated in Drill 1.1 or 1.2 before attempting to use the data with Drill 1.3
partition pruning.  This migration is mandatory because Parquet data generated by Drill 1.1
and 1.2 must be marked as Drill-generated, as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070).

+
+Drill 1.3 fixes a bug to accurately process Parquet files produced by other tools, such as
Pig and Hive. The bug fix eliminated the risk of inaccurate metadata that could cause incorrect
results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated
Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields
accurate results. Drill-generated Parquet files, regardless of the Drill release, contain
accurate metadata.
+
+After using the drill-upgrade tool to migrate your partitioned, pre-1.3 Parquet data, Drill
can distinguish these files from those generated by other tools, such as Hive and Pig. Use
the migration tool only on files generated by Drill. 
+
+To partition and query Parquet files generated from other tools, use Drill to read and rewrite
the files and metadata using the CTAS command with the PARTITION BY clause. Alternatively,
use the tool that generated the original files to regenerate Parquet 1.8 or later files.
+
 ## How to Partition Data
 
 In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required
to take advantage of partition pruning. Write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/)
clause in the CTAS statement. 

http://git-wip-us.apache.org/repos/asf/drill/blob/1ae0e4eb/_docs/sql-reference/sql-commands/035-partition-by-clause.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index 20cd6a3..8ec97e3 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -4,13 +4,17 @@ parent: "SQL Commands"
 ---
 The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/)
to improve performance when you query the data. (Drill 1.1.0)
 
-Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten
to work with partition pruning in 1.3 and beyond, information on the simple migration process
can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this
migration process came out of a bug fix included in the 1.3 release to accurately process
parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata
that could cause inaccurate
+{% include startnote.html %}Partitioned data generated in Drill 1.1-1.2 needs to be migrated
to Drill 1.3 before attempting to use the data.{% include endnote.html %}
+
+See [Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3]({{site.baseurl}}/docs/partition-pruning/#migrating-partitioned-parquet-data-from-drill-1-1-1-2-to-drill-1-3).
+
+<!-- Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata
rewritten to work with partition pruning in 1.3 and beyond, information on the simple migration
process can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need
for this migration process came out of a bug fix included in the 1.3 release to accurately
process parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate
metadata that could cause inaccurate
 results. Drill results have been accurate on files it created, and the files all contain
accurate metadata, the migration tool simply marks these files as having been produced by
Drill. Unfortunately without this migration we cannot reliably tell them apart from files
produced by other tools. The migration tool should only be used on files produced by Drill,
not those produced with other software products. Data from other tools will need to be read
in and completely rewritten to generate
-accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later).
+accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later). -->
 
 ## Syntax
 
-     [ PARTITION BY ( column_name[, . . .] ) ]
+`[ PARTITION BY ( column_name[, . . .] ) ]`
 
 The PARTITION BY clause partitions the data by the first column_name, and then subpartitions
the data by the next column_name, if there is one, and so on. 
 


Mime
View raw message