drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [25/31] drill git commit: Add note about parquet file migration in 1.3
Date Wed, 25 Nov 2015 22:03:13 GMT
Add note about parquet file migration in 1.3

Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/30309761
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/30309761
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/30309761

Branch: refs/heads/gh-pages
Commit: 3030976149324dd6cf8420ca9599365ff8a90956
Parents: e138bb9
Author: Jason Altekruse <altekrusejason@gmail.com>
Authored: Sun Nov 22 23:00:47 2015 -0800
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Wed Nov 25 10:39:15 2015 -0800

 _docs/sql-reference/sql-commands/035-partition-by-clause.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index 32dceb5..20cd6a3 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -4,6 +4,10 @@ parent: "SQL Commands"
 The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/)
to improve performance when you query the data. (Drill 1.1.0)
+Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten
to work with partition pruning in 1.3 and beyond, information on the simple migration process
can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this
migration process came out of a bug fix included in the 1.3 release to accurately process
parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata
that could cause inaccurate
+results. Drill results have been accurate on files it created, and the files all contain
accurate metadata, the migration tool simply marks these files as having been produced by
Drill. Unfortunately without this migration we cannot reliably tell them apart from files
produced by other tools. The migration tool should only be used on files produced by Drill,
not those produced with other software products. Data from other tools will need to be read
in and completely rewritten to generate
+accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later).
 ## Syntax
      [ PARTITION BY ( column_name[, . . .] ) ]

View raw message