parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jul...@apache.org
Subject git commit: PARQUET-119: add data_encodings to ColumnMetaData to enable dictionary based predicate push down
Date Thu, 30 Oct 2014 20:49:38 GMT
Repository: incubator-parquet-format
Updated Branches:
  refs/heads/master 3789d5aac -> f7ab552f5


PARQUET-119: add data_encodings to ColumnMetaData to enable dictionary based predicate push
down

To implement predicate push down based on dictionary we need to know if fallback happened.
If all data pages are dictionary encoded we can use the dictionary for predicate-push down.
If not we can not.

CC @nongli @rdblue @isnotinvain @tsdeng

Author: julien <julien@twitter.com>

Closes #16 from julienledem/data_encodings and squashes the following commits:

3a60c6c [julien] typo
46f7b7a [julien] update to stats based on feedback
6474f58 [julien] Merge branch 'master' into data_encodings
3529ccf [julien] make data_encodings optional
709dd7c [julien] add data_encodings to ColumnMetaData to enable dictionary based predicate
push down


Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/commit/f7ab552f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/tree/f7ab552f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/diff/f7ab552f

Branch: refs/heads/master
Commit: f7ab552f569df63bdb59f751d0dd36e826682739
Parents: 3789d5a
Author: julien <julien@twitter.com>
Authored: Thu Oct 30 13:49:26 2014 -0700
Committer: julien <julien@twitter.com>
Committed: Thu Oct 30 13:49:26 2014 -0700

----------------------------------------------------------------------
 src/thrift/parquet.thrift | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/blob/f7ab552f/src/thrift/parquet.thrift
----------------------------------------------------------------------
diff --git a/src/thrift/parquet.thrift b/src/thrift/parquet.thrift
index 20a7848..7544cf3 100644
--- a/src/thrift/parquet.thrift
+++ b/src/thrift/parquet.thrift
@@ -430,6 +430,22 @@ struct SortingColumn {
 }
 
 /**
+ * statistics of a given page type and encoding
+ */
+struct PageEncodingStats {
+
+  /** the page type (data/dic/...) **/
+  1: required PageType page_type;
+
+  /** encoding of the page **/
+  2: required Encoding encoding;
+
+  /** number of pages of this type with this encoding **/
+  3: required i32 count;
+
+}
+
+/**
  * Description for column metadata
  */
 struct ColumnMetaData {
@@ -469,6 +485,11 @@ struct ColumnMetaData {
 
   /** optional statistics for this column chunk */
   12: optional Statistics statistics;
+
+  /** Set of all encodings used for pages in this column chunk.
+   * This information can be used to determine if all data pages are
+   * dictionary encoded for example **/
+  13: optional list<PageEncodingStats> encoding_stats;
 }
 
 struct ColumnChunk {


Mime
View raw message