drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject [25/30] drill git commit: Add note about parquet file migration in 1.3
Date Mon, 23 Nov 2015 21:54:08 GMT
Add note about parquet file migration in 1.3

Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/df1072b7
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/df1072b7
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/df1072b7

Branch: refs/heads/gh-pages
Commit: df1072b7a260ef471040ec1e37d2b211275d71bc
Parents: dbce08e
Author: Jason Altekruse <altekrusejason@gmail.com>
Authored: Sun Nov 22 23:00:47 2015 -0800
Committer: Tomer Shiran <tshiran@gmail.com>
Committed: Mon Nov 23 10:11:44 2015 -0800

 _docs/sql-reference/sql-commands/035-partition-by-clause.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index 32dceb5..20cd6a3 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -4,6 +4,10 @@ parent: "SQL Commands"
 The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/)
to improve performance when you query the data. (Drill 1.1.0)
+Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten
to work with partition pruning in 1.3 and beyond, information on the simple migration process
can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this
migration process came out of a bug fix included in the 1.3 release to accurately process
parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata
that could cause inaccurate
+results. Drill results have been accurate on files it created, and the files all contain
accurate metadata, the migration tool simply marks these files as having been produced by
Drill. Unfortunately without this migration we cannot reliably tell them apart from files
produced by other tools. The migration tool should only be used on files produced by Drill,
not those produced with other software products. Data from other tools will need to be read
in and completely rewritten to generate
+accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later).
 ## Syntax
      [ PARTITION BY ( column_name[, . . .] ) ]

View raw message