drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [11/17] drill git commit: update to partition pruning intro to include refresh command for metadata cache file
Date Tue, 30 Aug 2016 22:29:26 GMT
update to partition pruning intro to include refresh command for metadata cache file


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2bc38da0
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2bc38da0
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2bc38da0

Branch: refs/heads/gh-pages
Commit: 2bc38da0e9ff9159b2337f3285aaaae05e5979aa
Parents: 21c41f5
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Thu Aug 11 12:02:19 2016 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Thu Aug 11 12:02:19 2016 -0700

----------------------------------------------------------------------
 .../partition-pruning/010-partition-pruning-introduction.md     | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/2bc38da0/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 315b062..e5f4e5f 100644
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -1,13 +1,12 @@
 ---
 title: "Partition Pruning Introduction"
-date: 2016-08-08 18:42:19 UTC
+date: 2016-08-11 19:02:20 UTC
 parent: "Partition Pruning"
 --- 
 
 Partition pruning is a performance optimization that limits the number of files and partitions
that Drill reads when querying file systems and Hive tables. When you partition data, Drill
only reads a subset of the files that reside in a file system or a subset of the partitions
in a Hive table when a query matches certain filter criteria.
 
-As of Drill 1.8, partition pruning also applies to the parquet metadata cache. See [Optimizing
Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/) to see
how to create a parquet metadata cache. When data is partitioned in a directory hierarchy,
Drill attempts to read the metadata cache file from a sub-partition, based on matching filter
criteria instead of reading from the top level partition, to reduce the amount of metadata
read during the query planning time. 
-
+As of Drill 1.8, partition pruning also applies to the Parquet metadata cache. When data
is partitioned in a directory hierarchy, Drill attempts to read the metadata cache file from
a sub-partition, based on matching filter criteria instead of reading from the top level partition,
to reduce the amount of metadata read during the query planning time. If you created a metadata
cache file in a previous version of Drill, you must issue the REFRESH TABLE METADATA command
to regenerate the metadata cache file before running queries for partition pruning to occur.
See [Optimizing Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/)
for more information.  
 
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
 


Mime
View raw message