impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Date Fri, 05 May 2017 21:02:11 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining
types
......................................................................


Patch Set 6:

(10 comments)

getting close

http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 543:     if (col_idx < col_orders.size()) col_order = &col_orders[col_idx];
what does this mean if col_orders is not empty? (does the spec allow a partial col_orders?)


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 1016:   file_metadata_.__isset.column_orders = true;
do we have a test that checks the integrity of generated parquet data? if so, we should also
check for the existence of this field.


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

Line 104:         StringValue* result = reinterpret_cast<StringValue*>(slot);
why no padding here?

do stats for char columns make sense, given that our char comparisons are broken?


http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 34: /// decode parquet::Statistics into slots.
> Done
i meant: update the comment to reflect the changes in this patch.


Line 108: 
> Done
please state explicitly that T must be one of the *Value classes


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 105:   /// 'col_order'. Otherwise, returns false.
what if col_order == 0?


Line 172:   static void CopyToBuffer(StringBuffer* buffer, StringValue* value);
move to -base and remove template params in .inl.h?


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 123:   // The ParquetPlainEncoder interface expects mutable pointers.
several function signatures in parquet-common.h look wrong (output args at beginning, missing
const, etc.)

please leave a todo there to fix it (or simply fix it - the fact that it's wrong is spilling
over into your new code, which isn't good).


http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 290: select count(*) from functional_parquet.alltypessmall where string_col < "1"
> I'm not sure I understand your comment. The test in L274 filters all rows u
nm


http://gerrit.cloudera.org:8080/#/c/6563/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 418: # Test CHAR type columns.
our char implementation is broken, i'm not sure these tests are meaningful.


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message