drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [09/17] drill git commit: Update partition pruning intro for 1.8 - pp on parquet metadata cache
Date Tue, 30 Aug 2016 22:29:24 GMT
Update partition pruning intro for 1.8 - pp on parquet metadata cache


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/bb118573
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/bb118573
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/bb118573

Branch: refs/heads/gh-pages
Commit: bb11857308c2b0e05d758288973bf63fa65b73ea
Parents: 2347455
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Mon Aug 8 11:42:18 2016 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Aug 8 11:42:18 2016 -0700

----------------------------------------------------------------------
 .../010-partition-pruning-introduction.md       | 47 +++++++++++---------
 1 file changed, 25 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/bb118573/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 20e314b..315b062 100644
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -1,22 +1,25 @@
----
-title: "Partition Pruning Introduction"
-date:  
-parent: "Partition Pruning"
---- 
-
-Partition pruning is a performance optimization that limits the number of files and partitions
that Drill reads when querying file systems and Hive tables. When you partition data, Drill
only reads a subset of the files that reside in a file system or a subset of the partitions
in a Hive table when a query matches certain filter criteria.
-
-The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
-
-## Using Partitioned Drill Data
-Before using Parquet data created by Drill 1.2 or earlier in later releases, you need to
migrate the data. Migrate Parquet data as described in ["Migrating Parquet Data"]({{site.baseurl}}/docs/migrating-parquet-data/).

-
-{% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include
endimportant.html %}
-
-## Partitioning Data
-In early versions of Drill, partition pruning involved time-consuming manual setup tasks.
Using the PARTITION BY clause in the CTAS command simplifies the process.
-
-
-
-
-
+---
+title: "Partition Pruning Introduction"
+date: 2016-08-08 18:42:19 UTC
+parent: "Partition Pruning"
+--- 
+
+Partition pruning is a performance optimization that limits the number of files and partitions
that Drill reads when querying file systems and Hive tables. When you partition data, Drill
only reads a subset of the files that reside in a file system or a subset of the partitions
in a Hive table when a query matches certain filter criteria.
+
+As of Drill 1.8, partition pruning also applies to the parquet metadata cache. See [Optimizing
Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/) to see
how to create a parquet metadata cache. When data is partitioned in a directory hierarchy,
Drill attempts to read the metadata cache file from a sub-partition, based on matching filter
criteria instead of reading from the top level partition, to reduce the amount of metadata
read during the query planning time. 
+
+
+The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
+
+## Using Partitioned Drill Data
+Before using Parquet data created by Drill 1.2 or earlier in later releases, you need to
migrate the data. Migrate Parquet data as described in ["Migrating Parquet Data"]({{site.baseurl}}/docs/migrating-parquet-data/).

+
+{% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include
endimportant.html %}
+
+## Partitioning Data
+In early versions of Drill, partition pruning involved time-consuming manual setup tasks.
Using the PARTITION BY clause in the CTAS command simplifies the process.
+
+
+
+
+


Mime
View raw message