drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: fix posted problems
Date Wed, 25 Nov 2015 22:31:43 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 212bc621a -> 5f887a64b


fix posted problems


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/5f887a64
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/5f887a64
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/5f887a64

Branch: refs/heads/gh-pages
Commit: 5f887a64b3a2527bf81cd72729ea3d2ab4a60fce
Parents: 212bc62
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Wed Nov 25 14:26:41 2015 -0800
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Wed Nov 25 14:26:41 2015 -0800

----------------------------------------------------------------------
 .../035-plugin-configuration-basics.md                 | 13 ++-----------
 _docs/performance-tuning/020-partition-pruning.md      |  9 +++++++++
 .../sql-commands/035-partition-by-clause.md            | 10 +++++++---
 3 files changed, 18 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/5f887a64/_docs/connect-a-data-source/035-plugin-configuration-basics.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/035-plugin-configuration-basics.md b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
index 9cbb195..12c6033 100644
--- a/_docs/connect-a-data-source/035-plugin-configuration-basics.md
+++ b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
@@ -84,18 +84,13 @@ The following table describes the attributes you configure for storage
plugins i
   </tr>
   <tr>
     <td>"formats"</td>
-<<<<<<< HEAD
-    <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"<br>"sequencefile"
*</td>
+    <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"*<br>"sequencefile"</td>
     <td>yes</td>
-=======
-<td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"<br>"sequencefile"
*</td>
-    <td>yes if type is file</td>
->>>>>>> remotes/apache/gh-pages
     <td>One or more valid file formats for reading. Drill implicitly detects formats
of some files based on extension or bits of data in the file; others require configuration.</td>
   </tr>
   <tr>
     <td>"formats" . . . "type"</td>
-    <td>"text"<br>"parquet"<br>"json"<br>"maprdb"<br>"avro"<br>"sequencefile"
*</td>
+    <td>"text"<br>"parquet"<br>"json"<br>"maprdb"*<br>"avro"<br>"sequencefile"</td>
     <td>yes</td>
     <td>Format type. You can define two formats, csv and psv, as type "Text", but having
different delimiters. </td>
   </tr>
@@ -164,11 +159,7 @@ For example, using uppercase letters in the query after defining the
storage plu
 
 ## Storage Plugin REST API
 
-<<<<<<< HEAD
 If you need to add a storage plugin configuration to Drill and do not want to use a web browser,
you can use the [Drill REST API]({{site.baseurl}}/docs/rest-api/#get-status-threads) to create
a storage plugin configuration. Use a POST request and pass two properties:
-=======
-If you need to add a storage plugin configuration to Drill and do not want to use a web browser,
you can use the [Drill REST API]({{site.baseurl}}/docs/rest-api/) to create a storage plugin
configuration. Use a POST request and pass two properties:
->>>>>>> remotes/apache/gh-pages
 
 * name  
   The storage plugin configuration name. 

http://git-wip-us.apache.org/repos/asf/drill/blob/5f887a64/_docs/performance-tuning/020-partition-pruning.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/020-partition-pruning.md b/_docs/performance-tuning/020-partition-pruning.md
old mode 100755
new mode 100644
index c86620c..26f681f
--- a/_docs/performance-tuning/020-partition-pruning.md
+++ b/_docs/performance-tuning/020-partition-pruning.md
@@ -7,6 +7,15 @@ Partition pruning is a performance optimization that limits the number of
files
  
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
 
+## Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3
+Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet
data that you generated in Drill 1.1 or 1.2 before attempting to use the data with Drill 1.3
partition pruning.  This migration is mandatory because Parquet data generated by Drill 1.1
and 1.2 must be marked as Drill-generated, as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070).

+
+Drill 1.3 fixes a bug to accurately process Parquet files produced by other tools, such as
Pig and Hive. The bug fix eliminated the risk of inaccurate metadata that could cause incorrect
results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated
Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields
accurate results. Drill-generated Parquet files, regardless of the Drill release, contain
accurate metadata.
+
+After using the drill-upgrade tool to migrate your partitioned, pre-1.3 Parquet data, Drill
can distinguish these files from those generated by other tools, such as Hive and Pig. Use
the migration tool only on files generated by Drill. 
+
+To partition and query Parquet files generated from other tools, use Drill to read and rewrite
the files and metadata using the CTAS command with the PARTITION BY clause. Alternatively,
use the tool that generated the original files to regenerate Parquet 1.8 or later files.
+
 ## How to Partition Data
 
 In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required
to take advantage of partition pruning. Write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/)
clause in the CTAS statement. 

http://git-wip-us.apache.org/repos/asf/drill/blob/5f887a64/_docs/sql-reference/sql-commands/035-partition-by-clause.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index 20cd6a3..8ec97e3 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -4,13 +4,17 @@ parent: "SQL Commands"
 ---
 The PARTITION BY clause in the CTAS command partitions data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/)
to improve performance when you query the data. (Drill 1.1.0)
 
-Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata rewritten
to work with partition pruning in 1.3 and beyond, information on the simple migration process
can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need for this
migration process came out of a bug fix included in the 1.3 release to accurately process
parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate metadata
that could cause inaccurate
+{% include startnote.html %}Partitioned data generated in Drill 1.1-1.2 needs to be migrated
to Drill 1.3 before attempting to use the data.{% include endnote.html %}
+
+See [Migrating Partitioned Data from Drill 1.1-1.2 to Drill 1.3]({{site.baseurl}}/docs/partition-pruning/#migrating-partitioned-parquet-data-from-drill-1-1-1-2-to-drill-1-3).
+
+<!-- Note: parquet files produced using Drill 1.1 and 1.2 will need to have their metadata
rewritten to work with partition pruning in 1.3 and beyond, information on the simple migration
process can be found on [Github](https://github.com/parthchandra/drill-upgrade). The need
for this migration process came out of a bug fix included in the 1.3 release to accurately
process parquet files produced by other tools (like Pig and Hive) that had a risk of inaccurate
metadata that could cause inaccurate
 results. Drill results have been accurate on files it created, and the files all contain
accurate metadata, the migration tool simply marks these files as having been produced by
Drill. Unfortunately without this migration we cannot reliably tell them apart from files
produced by other tools. The migration tool should only be used on files produced by Drill,
not those produced with other software products. Data from other tools will need to be read
in and completely rewritten to generate
-accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later).
+accurate metadata. This can be done using Drill or whatever tool originally produced them,
as long as it is using a recent version of parquet (1.8 or later). -->
 
 ## Syntax
 
-     [ PARTITION BY ( column_name[, . . .] ) ]
+`[ PARTITION BY ( column_name[, . . .] ) ]`
 
 The PARTITION BY clause partitions the data by the first column_name, and then subpartitions
the data by the next column_name, if there is one, and so on. 
 


Mime
View raw message